How to describe predicted vs measured/observed phenotypes?

ddooley commented 2 years ago

1) Phenotypes can be observed/measured, or inferred from data (e.g. genomic sequence data). If inferred, there is a degree of uncertainty as to whether the expected phenotype will actually inhere in the thing. For example in organism X, a specific mutation may correlate with resistance to a given drug. The correlation has not been tested and confirmed for organism Y. It is expected that resistance will occur if the mutation is observed, but has not been confirmed. In this case the phenotype is predicted and there is a degree of uncertainty that needs to be communicated to users of the data e.g. clinicians.

Could we introduce “predicted phenotype” into PATO?

2) Confidence in predicted phenotypes: When predictions about phenotypes are made based on data and correlations, if the correlations are strong, there can be a high degree of confidence in the predictions. If there is less data or the correlations are weaker, there may be less confidence in the predictions. The confidence level may be descriptive (e.g. high, moderate, low), or could be numerical (given a statistical method).

Would PATO be the place for “**predicted phenotype confidence level**”?

Emma would be happy to discuss this in a PATO curation call if desired.

c/o @griffie

shawntanzk commented 2 years ago

assigning @dosumis and @cmungall - figure you'd be interested in this.

Would PATO be the place for “predicted phenotype confidence level”?

Regarding this, I think it is similar to something we are trying to work out in the Brain Data Standards ontology in recording confidence of markers that identify a cell type. We are trying to do this through having a class that contains information about the method and the statistics (in our case NS-forest and F-beta score) - we haven't fully figured this out yet, but we are trying to use STATO to record the confidence scores, if you're interested, you can follow the thread here: https://github.com/ISA-tools/stato/issues/85

dosumis commented 2 years ago

I think PATO is not the right place. A predicted phenotype is not a quality of something, like its colour, length or shape. It also sounds like you need properties, which PATO has not, to now, been in the business of minting.

In general there is resistance in OBO to recording uncertainty/evidence in assertions rather than on them. Many of the assertions we make have some degree of uncertainty - with evidence improving over time.

I think the conventional OBO way to do this would be to annotate X has_phenotype some Y (or a simple triple "X has_phenotype Y") with an AP axiom recording some confidence score. I think a dedicated OP (has predicted phenotype) would cause less problems at the individual level than the class level. You cold then accumulate individual pieces of evidence as separate annotated triples. Query-wise. RDF/OWL is not a great fit for this type of thing. Converting to a graph representation with edge annotations (e.g. in Neo4j) is much better.

Perhaps RO or STATO would be suitable places for the confidence score AP?

pato-ontology / pato

How to describe predicted vs measured/observed phenotypes? #477