Open matentzn opened 1 year ago
Yes, I agree. I will work on it if time allows.
Correct me if I am wrong, but UniProt accessions refer to specific amino-acid sequences of polypeptides encoded by genes. For example, here is the basic human cardiac troponin I UniProt entry:
sp|P19429|TNNI3_HUMAN Troponin I, cardiac muscle OS=Homo sapiens OX=9606 GN=TNNI3 PE=1 SV=3 MADGSSDAAR EPRPAPAPIR RRSSNYRAYA TEPHAKKKSK ISASRKLQLK TLLLQIAKQE LEREAEERRG EKGRALSTRC QPLELAGLGF AELQDLCRQL HARVDKVDEE RYDIEAKVTK NITEIADLTQ KIFDLRGKFK RPTLRRVRIS ADAMMQALLG ARAKESLDLR AHLKQVKKED TEKENREVGD WRKNIDALSG MEGRKKKFES
Note, that the above sequence differs from that of the mouse cardiac troponin I:
sp|P48787|TNNI3_MOUSE Troponin I, cardiac muscle OS=Mus musculus OX=10090 GN=Tnni3 PE=1 SV=2 MADESSDAAG EPQPAPAPVR RRSSANYRAY ATEPHAKKKS KISASRKLQL KTLMLQIAKQ EMEREAEERR GEKGRVLRTR CQPLELDGLG FEELQDLCRQ LHARVDKVDE ERYDVEAKVT KNITEIADLT QKIYDLRGKF KRPTLRRVRI SADAMMQALL GTRAKESLDL RAHLKQVKKE DIEKENREVG DWRKNIDALS GMEGRKKKFE G
For this discussion, let us ignore any protein isoforms or post-translational modifications that result in proteins/polypeptides that differ from the amino-acid sequence of a UniProt accession.
In my opinion it would be wrong to annotate a mouse trait with an OBA class that uses a human-specific P19429|TNNI3_HUMAN Troponin I, cardiac muscle
as its entity component. The mouse and human troponin I refer to different entities, hence the differen UniProt accession.
OK, we could add another OBA class with the mouse UniProt ID as the component entity and let people figure out if the two cardiac troponin I traits are related or not. How useful would that be for OBA users who would like to integrate phenotypic traits from different model organisms? Semantically any two of the thousands of protein X level
would look equally similar (being a direct subclass of protein amount
) as the biologically meaningful pair of mouse-human cardiac troponin I level traits.
As of today, there are over 250,000,000 uniprot identifiers. Even if we consider the Swiss-Prot reviewed subset for genetic model organism, it is still hundreds of thousands of uniprot IDs that can be used to create new OBA terms of the type 'UniProt ID' in serum
and/or 'UniProt ID' in blood
. That is a lot for term inflation.
I have considered the above problems, and I decided to use the PRO homology groupings. They group together orthologous UniProt amino-acid sequence entries from taxons human, mouse and rat. The term request for the protein X level
came from the GWAS Catalog (human traits), and the PRO homology grouping classes are defined by the human polypeptide.
For example, PR:000016506 troponin I, cardiac muscle is defined as: A protein that is a translation product of the human TNNI3 gene or a 1:1 ortholog thereof.
As a result, the term OBA:2045369 troponin I, cardiac muscle level can be used to annotate human, mouse and rat phenotypic traits.
Exactly. That is my point. I expect these terms to be used mostly by human, mouse and rat quantitative traits. Evolutionary homology is more meaningful than none. Uberon is based on evolutionary homology. I could argue that human forelimbs should be assigned different Uberon IDs than the homologous structures of mouse. I could not type this text if I had mouse forelimbs. If I had mouse troponin I in my muscles, I sure would be able to do so to some extent.
You can fall back on adding new component terms based on UniProt IDs of polypeptides from species that are outside of the PRO homology groupings (e.g. for fish proteins).
No, it is not complicated. You can just fall back on adding new component terms based on UniProt IDs of polypeptides from species that are outside of the PRO homology groupings. It is also a less complicated process than when you need to decide if a bone from a fish is the same Uberon bone class as in the human, because the similarity is computable, and provides an objective criterion for homology grouping, if needed.
Not in this case though. See my example about anatomy terms.
Disagree. Homology is baked in many ontologies, e.g. Uberon. See my arguments above. Without evolutionary homology, these OBA classes would be just superficial and biologically meaningless formalisms.
Good project for the future. In the meantime, I think I leave these homology grouping terms to help integrate mouse, rat and human quantitative traits.
From @cmungall : Only just saw this.
I would have said use the species specific proteins with IDs from uniprot
Originally posted by @cmungall in https://github.com/obophenotype/bio-attribute-ontology/issues/251#issuecomment-1638899053