monarch-initiative / owlsim-v3

Ontology Based Profile Matching
16 stars 5 forks source link

refactor Matchers with Comparator methods #10

Open nlwashington opened 9 years ago

nlwashington commented 9 years ago

I want to be able to perform comparisons and get scores between two profiles directly, or perform matching to a supplied set of profiles (as opposed to the individuals in the kb - i need this to be able to perform comparisons against background datasets not loaded into the kb).

The ProfileMatcher and it's subclasses needs to expose public methods for direct comparison of profiles. An alternative would be to make some kind of separate ProfileComparer class, that then the Matchers leverage.

@cmungall given the OK, i'll go ahead with this refactor.

cmungall commented 9 years ago

Main worries are code repetition and efficiency. Current implementations assume a list of individuals and can take advantage of this for faster processing. Also, a background set is assumed for certain methods (e.g. phenodigm-like ranking).

I think this way may be best: "ProfileMatcher and it's subclasses needs to expose public methods for direct comparison of profiles" - however, not all subclasses should be expected to implement this, and clients should behave appropriately. I would only try and reuse code if it doesn't impact speed. A bit of code repetition is OK so long as everything is documented.

cmungall commented 9 years ago

Be good to get @hdietze or @ccondit opinions here

nlwashington commented 9 years ago

absolutely, i don't want to repeat code. basically i want to be able to compare two things against each another and get the score from the specified scoring method. in addition, to compute probabilities of the resulting scores (for IC-based methods), i need to be able to compute p-values against some kind of generated backgrounds. for that i need to supply my own set of individuals (in the form of EWAHCompressedBitmaps) for computing matches (but not overload the graph with them).

anyway, i think the basic pattern for the refactor is simply:

  1. extract the scoring method (getScore(bm1, bm2)) in each of the matchers into it's own public method, which takes two EWAHCompressedBitmaps to compare. right now scoring is imbedded within the findMatchProfileImpl methods, but i think this should be available from of any of the methods that computes a score. (except it looks like it's already extracted in PhenodigmICProfileMatcher)
  2. add a method, findMatchProfileImpl(q,individualBMs), where the user can supply a EWAHCompressedBitmap[] of "individuals" to compute the matches on (instead of the individuals in kb). since these "individuals" are not added to the graph, they would not have identifiers except for an index in the supplied array (which the current method assumes), which means downstream methods will need to be able to handle this case in the MatchSet. we'll still have a findMatchProfileImpl(q) method, that will make the assumption of using the kb individuals by default.
nathandunn commented 9 years ago

Sorry, this is the one. I was wondering if it makes sense to cache your searches. If you have a specific test to look at, that would be great.

cmungall commented 9 years ago

Gotcha, yep. The caching isn't really relevant here (Nicole is piling ahead with the refactor)

But yes it is useful to cache. We had extensive cacheing in owlsim2, hasn't been needed thus far in v3 but we will likely add this, especially for class-class comparisons