mapping-commons / mh_mapping_initiative

Repo to organise the mouse-human phenotype mapping initiative and reconcile resources.
13 stars 1 forks source link

Problems when inferring human gene-phenotype associations from model organism by sequence orthology #9

Open liulizhi1996 opened 3 years ago

liulizhi1996 commented 3 years ago

Model organisms, especially the laboratory mouse Mus musculus, provide useful knowledge about human diseases. I am studying human gene-HPO term annotations and want to utilize phenotype annotations of animal models to improve the prediction of HPO annotations of human genes.

However, I find a strange problem. Taking keratoconjunctivitis sicca as an example, the associated human genes and mouse genes are largely different:

If I map these mouse genes to their orthologous human genes, the intersection of two gene sets is empty. Why are the genes related to the same phenotype so different between human and model organism? Is there something wrong here?

Moreover, I check the related human and mouse genes to DOID:12895 (keratoconjunctivitis sicca), they are

Some genes here are inferred from sequence orthology by RGD. But it is strange that the annotated genes here are quite different from those in HP/MP annotations. Why are the genes associated with the same phenotype so different? Is it feasible/reliable to infer gene-phenotype associations from sequence orthology like what RGD does here?

sbello commented 3 years ago

Looking at this I think it partly comes down to investigation/annotation incompleteness. Of the 41 human genes, 34 have at least one mouse ortholog with an MP annotation. Of that 34, only 13 mouse orthologs have an MP annotation in the vision/eye system. And none of those annotations mention keratoconjunctivitis sicca. It may be simply that no one has looked in mouse for this particular phenotype

An additional factor that may be something the mapping project can address, the MP has has: keratoconjunctivitis sicca - inflammation of the cornea and conjuctiva caused by eye dryness AND dry eyes - absence of natural or normal moisture in the eye (no mention of inflammation)

The HPO has only "Keratoconjunctivitis sicca" and this has the synonym "dry eyes".

So it appears that despite the similar labels we are not using the term in the same way in the two ontologies. Note the definitions are not quite the same. The HPO definition does not mention inflammation, although based on placement this should be inferred from the definition of the parent.

An MGI curator working on a mouse phenotype may choose to use "dry eyes" when the authors only mention dry eyes without mentioning inflammation.

Looking at the existing annotations to keratoconjunctivitis sicca in MGI these are all also annotated to dry eyes to capture that part of the phenotype.

I know RGD and the Alliance will infer gene relations to DISEASE based on sequence homology (https://www.alliancegenome.org/disease/DOID:12895). But I'm fairly certain that RGD does not do that for phenotype annotations (correct me if I'm wrong @slaulederkind). The reasoning (as I understand) is that it is useful to know if you can at least genetically model a disease in a model species (that is the orthologous gene exists and you may be able to look at some disease aspects even if it is at the level of biochemical function of the gene). But for a phenotype you have less confidence that the gene mutation will produce the same or orthologous phenotype unless it has been reported. So the sequence inference is not made.