Open khanspers opened 1 year ago
The Pfam database is retired, and included in interpro. Will interpro identifiers work as well?
Having InterPro identifiers work would be great anyway. But what I read at the Interpro website is that they will host PFAM which sounds different from using Interpro identifiers instead. Does anybody know how that really works?
It seems like PFAM is still actively providing content, but this will be found via the interprot webpage only: https://xfam.wordpress.com/2022/08/04/pfam-website-decommission/ In the interprot search, you can then see a list of results coming from the different sources.
For this particular case, the best match I could find is this InterPro identifier, for Ribosomal protein S6 kinase: https://www.ebi.ac.uk/interpro/entry/InterPro/IPR016238/
I added a stage here, using the pathway from the curation report as the example: https://academy.wikipathways.org/stages/draw-protein-families/ (not yet integrated in the path). Please review.
One thing to add is a comment about data mapping (i.e. won't work for these nodes)
Looks good! Only the upload doesn't work yet, is that intended? I got the following error: Oops! That doesn't look quite right. Please try again. Incorrect number of objects: 5 detected, 0 expected. Are there plans to to include the data mapping at some point? Would be great if the family could be connected to the actual proteins somehow.
Thanks @danidi! There was a typo in the gpml validation, it is fixed now.
For the data mapping, there is no plan to make that work as far as I know. These instructions were only meant to solve the issue raised in the curation report, basically the alternative to leaving it empty. I can to add a comment to the task that data mapping from individual proteins that are part of the family won't work, and maybe also describe the alternate approach of adding individual proteins as a stack of nodes off to the side of the pathway (like we do with other groupings or genes/proteins)?
On second thought, Im not sure this should be a stage in the Academy. Although the idea to use an Interpro ID instead of leaving the xref blank is still valid for individual cases (for example the original question by Javi), it's potentially counter-intuitive and confusing as a stage in the Academy since it doesn't enable data mapping at all (at least in PathVisio, or in a straight-forward way in Cytoscape). We can keep this issue open for discussion, but Im not going to add the stage to the path for now.
I think that that is fine for now. But it is one of the ideas that often come up in discussions about sequencing data to functionally evaluate sequencing data from multi-species mixtures, e.g. microbiome samples. If we can assign motifs in. sequences to functional protein motifs, and through that to pathways we could in principle evaluate the functionality or the functional capacity of such a mixture without assigning the sequences to species or complete genes. Of course we do not even have complete methods for that yet indeed.
From a curation report issue, how to best model protein families in pathways. Either use Pfam id on a single node representing the protein family, or list out all members of the protein family (if feasible).