monarch-initiative / owlsim-v3

Ontology Based Profile Matching
16 stars 5 forks source link

Explore many-to-many matches for profiles #89

Open jmcmurry opened 7 years ago

jmcmurry commented 7 years ago

It would be interesting to test out the profile matching when ALL pairwise phenotype matches (N Factorial matches above a threshold) are factored into the overall profile match score.

Eg, in the situation below, what would it change to NOT throw out the match between ...epiphysis and toe, simply because the one between epiphysis and finger is better? Same with the delayed speech and intellectual disability features.

fig3_ws_syndrome_double_triptych_2017-05-14d

Thoughts? cc: @julesjacobsen , @damiansm, @mellybelly, @pnrobinson

pnrobinson commented 7 years ago

We tried that when we were designing the Phenomizer algorithm and it did not work well. There is no real reason to think that the various phenotypes should match all of the phenotypes of the query, although perhaps there are subprofiles that could be exploited. It is an interesting topic.

jmcmurry commented 7 years ago

There is no real reason to think that the various phenotypes should match all of the phenotypes of the query

I agree; this is why I was just talking about those above a certain threshold, otherwise profiles with lots of phenotypes would be unfairly penalized. Right now, we have that profiles with lots of phenotypes are potentially already penalized due to the number of those unmatched. However, if they're not completely unmatched, but rather fuzzy matched, it could change the calculus rather a lot.

cmungall commented 7 years ago

I agree with Peter, matching all phenotypes against all doesn't work.

But I see the intuition guiding the question and this deserves an answer along the lines of having a probabilistic model that can account for a latent phenotype explaining multiple similar phenotypes. Not sure I can write it in a GH ticket though.

Also, are you totally sure the figure is an accurate depiction of what is coming back from the algorithm? I would expect GDD and DSaLD to pair.