Open nlwashington opened 10 years ago
math-commons now has sparse matrix classes that could be useful here. http://www.cs.waikato.ac.nz/ml/weka/ has a whole ton of machine learning stuff that we might find useful, and also includes machine learning algorithms (and some visualization stuff too). and there's a very basic matrix library, which has some nice convenience functions: http://la4j.org/
I found this library useful for working with matrices: https://sites.google.com/site/qianmingjie/home/toolkits/laml
and this one for graphs: http://www.i3s.unice.fr/~hogie/grph/
Math commons has a nice ecosystem if needs go beyond matrices. Haven't used weka for awhile, but they were much more geared to statistical processing and visualization of data-sets.
Hi there, also there is Mahout, which some nice libraries for vector operations. I've briefly used RandomAccessSparseVector and SparseMatrix classes, and I believe there are some classes for similarity e.g., cosine. There is a book and active mailing list for that project, fwiw.
I've added the implementation for TF-IDF - or more concretely an adapted version to pairs of terms. There is one issue, though: knowledgeBase.getTypesBM(individualId) returns, among other things, OWL:Thing or MP:000001 (in the example I've used), which should be discarded by default from the resulting TF-IDF ranking.
Here's the question: Is the class_index of OWL:Thing hardcoded, or is it always different? How can I find it without explicitly retrieving it from the KB via the classId?
@cmungall Reviving this very old thread to ask whether there's been any recent discussion? This feature is important in order to aid the deep phenotyping, for mod researchers, physicians, and patients alike.
not yet
On 12 Apr 2016, at 9:33, Julie McMurry wrote:
@cmungall Reviving this very old thread to ask whether there's been any recent discussion? This feature is important in order to aid the deep phenotyping, for mod researchers, physicians, and patients alike.
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/monarch-initiative/owlsim-v3/issues/1#issuecomment-208994893
from the old sim2 codebase, i took a crack at computing a co-annotation matrix using the term frequency-inverse document frequency algorithm (TF-IDF). that code is in these methods:
computeTFIDFMatrix getCoannotatedClassesForAttribute getCoAnnotatedClassesForIndividual getCoAnnotatedClassesForAttributes getCoAnnotatedClassesForMatches populateFullCoannotationMatrix getSubsetCoannotationMatrix initCoannotationMatrix
this needs to be ported from sim2 and refactored. it worked in my tests, but the performance was terrible once i scaled up to actual full-size data. i think the refactor will need to use a sparse matrix.
these will then provide the necessary calls for services to get commonly co-annotated classes