monarch-initiative / owlsim-v3

Ontology Based Profile Matching
16 stars 5 forks source link

Implementing meta-information for annotations of objects #77

Closed drseb closed 7 years ago

drseb commented 7 years ago

Hi all,

after a long Skype-session with Jules, we came up with the following owlsim3-relevant ticket:

We will need more meta-data associated with object-to-ontology term associations. At the moment it seems to be only possible to add negation (although this seems to be non-trivial to provide). To get an idea what I am referring to here, see my comment at this PR

This ticket is about two things, how do we handle meta-information internally and how do we provide this information to owlsim3 (syntax-wise).

  1. Handling I would like to know how complicated this will be from a software-perspective. Must haves are negation (this seems to be working somehow already) and frequency (for diseases/mouse populations). Date we need in a not so distant future are time points (e.g. congenital, juvenile, adult etc.). Here we might have to look at the recent W3C time ontology. Information, that I think will be important later, are confidence (e.g. how sure is the user that the distance between the eyes is abnormal) and relevance (i.e. does the user think this an important feature for the diagnosis). The rational behind the idea of providing confidence and relevance (and maybe more, like expected agreement with other physicians opinions) comes from discussions such as in this article or this recent Nature paper (esp. Study 3), where it is often found useful to let the user indicate additional information/priors to their choice/query (I know the latter article argues against confidence as parameter, but the point is that we should prepare for additional user-provided data). Although, we currently miss the ontology-based algorithms to make use of such information, I think it is utterly important to discuss now how to collect such information. Any thoughts welcome!

  2. Syntax How should such information be provided by the user to our tools? One idea from @julesjacobsen was a JSON representation like {HP:0001, freq=0.99, rel=0.9, NOT}. How should such information be handled internally? Would be great if you could have some thoughts on this.

@julesjacobsen please add to this ticket if I missed important points/ideas.

Everybody is invited to give thoughts, especially @pnrobinson , @cmungall , @mellybelly , @julesjacobsen , @jnguyenx , etc.

pnrobinson commented 7 years ago

Hi Seb,

are we talking about the disease models or the queries that people are sending to Genomiser or both? Can we please schedule a telcon about this, I do not feel I have enough information about the overall gameplan at this point.

-Peter

drseb commented 7 years ago

Both. Frequency makes only sense for diseases. Confidence makes more sense for individual's annotation.Skype sounds good

cmungall commented 7 years ago

OK, let's separate external I/O from internal modeling. For external I/O, all of this is handled by phenopackets. We have a PR for loading phenopackets #30. I'm open to also having smaller ad-hoc jsons and tsvs for particular limited scenarios, but I think this could create confusion longer term.

For internal modeling, OwlSim3 assumes everything is converted to OWL, and then it has its own simplified OWL internal storage.The subset supported historically supported was:

of course other constructs are supported for the pre-processing reasoning step, but OwlSim operations were historically defined in terms of the above.

For FrequencyAware NaiveBernoulliBayes, we extended the KB model to include getDirectWeightedTypes for any individual. In theory it should be possible to keep advancing the KB with additional methods like this, but yr thoughts welcome as to a larger refactor.

cmungall commented 7 years ago

For temporal info, we can just use RO axioms and expressing things like

Phenotype and 'starts during' some Phase1 and 'ends during' some Phase2

OWLTime axioms may come in useful to map quantitive temporal data to bins.

@dosumis did a paper on classifying phenotypes based on RO temporal axioms: https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-4-30

The simplest approach is to make the required groupings ahead of time and then feed it to the algorithm like any other ontology.

But we should probably be thinking more algorithm-first, representation-later. There are a lot of different approaches here, and it depends on what data is available - temporal bins, quantitative progression data, etc.

drseb commented 7 years ago

Thanks.

For external I/O, all of this is handled by phenopackets

I was kind of expecting that and understand your motivation. In my opinion this feels a bit bulky compared to something simple as {HP:0001, freq=0.99, rel=0.9, NOT}, but ok.

But we should probably be thinking more algorithm-first, representation-later.

But how can I try out algorithmic ideas, when I can't get the information into owlsim.

But maybe we can discuss on one of the future calls.

cmungall commented 7 years ago

I agree with your points, I don't know what the best approach is other than to be pragmatic - if we want to try something new lets just get the information in there, some of the harmonization can come later.