Closed marvinm2 closed 3 years ago
All the histone genes seem to be linked to a list of ncbi genes and ensembl genes: https://bit.ly/3AkinFM
But none of these appear in the WP
Found the cause in the WP (not gpml) rdf file for this pathway (attached) WP4657.txt
The protein annotated with https://identifiers.org/uniprot/P62805 has all these names and IDs.
@marvinm2 @egonw @mkutmon We know now where it comes from - but still may want to solve this mapping issue. Its not one pathway, but a few hundred which have this extensive histone gene mappings. And, these histones make according to the WP RDF the MOST ABUNDANT genes in WikiPathways:
Maybe with an updated bridgeDb for gene/geneproducts? or if it is from the source, check with Uniprot if that is really intended?
Found the cause in the WP (not gpml) rdf file for this pathway (attached) WP4657.txt
This is something essential to realize: in the WikiPathways RDF world, WPRDF is not the "RDF of a pathway". That is the GPMLRDF. WPRDF is the full biological knowledge in WikiPathways.
"Debugged" it and the problem is the P62805
UniProt identifiers in the pathways:
The multiple gene mappings come originally from Ensembl/UniProt:
@fehrhart, there are 34 pathways with that UniProt identifier: https://bit.ly/39v8BFj
I have created a unit test for it.
Of these, 33 are Reactome pathways: https://bit.ly/3lD7GrK
Freddie asked me about this issue. WP4657 on WikiPathways has no histone genes but the RDF gives some: https://bit.ly/39a0jCu
Also the original ttl file does not have them. (attache WP4657.txt d)
Somehow these histone genes show up. What could be the cause?