Closed magnusmanske closed 5 years ago
Example has only one mRNA in the GFF file:
Pf3D7_07_v3 chado CDS 75918 76844 . - 0 ID=PF3D7_0701800.1:exon:1;Parent=PF3D7_0701800.1
Pf3D7_07_v3 chado gene 75918 77055 . - . ID=PF3D7_0701800;Name=RIF;previous_systematic_id=PF07_0003;synonym=RIF-A
Pf3D7_07_v3 chado mRNA 75918 77055 . - . ID=PF3D7_0701800.1;comment=rif (repetitive interspersed family) genes were originally identified by Weber%3B they encode clonally variant RIFIN proteins%2C which are likely expressed on the infected
Pf3D7_07_v3 chado polypeptide 75918 77055 . - . ID=PF3D7_0701800.1:pep;Derives_from=PF3D7_0701800.1
Pf3D7_07_v3 chado CDS 76987 77055 . - 0 ID=PF3D7_0701800.1:exon:2;Parent=PF3D7_0701800.1
There is mention of PIR protein in there, but not sure how to extract.
Hi Magnus,
These are several products on the same mRNA.
For PF3D7_0423800, this is how it looks in the GFF file. The preferred product is cysteine-rich protective antigen.
product=term%3Dcysteine-rich protective antigen%3Bdb_xref%3DPMID:22593616%3Bevidence%3DInferred from Direct Assay,rank%3D1%3Bterm%3DRH5-Ripr membrane anchoring protein%3Bdb_xref%3DPMID:25583518%3Bevidence%3DInferred from Direct Assay
For the rifin example, this is how it is shown in the GFF file. Rifin is the preferred product.
rifin%3Bdb_xref%3DPMID:25751816%3Bevidence%3DInferred from Direct Assay,rank%3D1%3Bterm%3DPIR protein
Please let me know if there are any questions.
Thanks, Uli
I think this is the same as #18 I create one Wikidata item per gene, and one per mRNA (or protein). I can't really sub-divide the protein item any further into products, and the products don't seem to have different IDs, but both share the mRNA ID?
These are alternative products for the same protein. They share the mRNA ID.
Here is an example on how this is displayed on PlasmoDB: http://plasmodb.org/plasmo/app/record/gene/PF3D7_0423800#AlternateProducts
UniProt shows them as alternative names. Here is an example: https://www.uniprot.org/uniprot/Q8I5R7
Would be great if there is a possibility to show those alternative products also on Wikidata.
On 26 Feb 2019, at 15:33, Magnus Manske notifications@github.com wrote:
I think this is the same as #18 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_magnusmanske_genedb-5Fwd_issues_18&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=Z5FSUBUjhzkOPU65fqvlA1b_BWi1CN4lDKsEGW-riyo&s=Rh8WQ2TT-JABXK5-FPFVQTddQQMjC0fgi9Ci2F3Z-A8&e= I create one Wikidata item per gene, and one per mRNA (or protein). I can't really sub-divide the protein item any further into products, and the products don't seem to have different IDs, but both share the mRNA ID?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_magnusmanske_genedb-5Fwd_issues_5-23issuecomment-2D467485580&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=Z5FSUBUjhzkOPU65fqvlA1b_BWi1CN4lDKsEGW-riyo&s=rLFOWhvFzMA69njTbtXgP4GvoCrssL1MW-W1zB8DCpY&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AX76FPWDnzJvmPb00pR762hMPjO5tpQ0ks5vRVPkgaJpZM4W4hon&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=Z5FSUBUjhzkOPU65fqvlA1b_BWi1CN4lDKsEGW-riyo&s=-gjT1HeFbGCtYu6_g7ysLAFN4cP0K7XNMwbPk6OkKew&e=.
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
If they are different products (eg splicing variants), they are different proteins. If they are different proteins, they should have different transcript IDs.
Your PlasmoDB example calls them "Alternate Product Descriptions". UniProt calls them "Alternative names". That both sounds like they are the same product, but somehow that has picked up an alternate name. I am already listing that as "alternative names", e.g. on https://www.genedb.org/#/gene/PF3D7_0423800
If that is insufficient, please let me know what, exactly, these different "products" are, in slightly more scientific terms.
Hi Magnus,
Thanks for the explanation. Sorry about the confusion. I did not notice that this is listed as alternative name in the protein section. Sorry, I’ve only checked the top of the gene page.
The ticket can definitely be closed. Thanks!
Uli
On 27 Feb 2019, at 12:45, Magnus Manske notifications@github.com wrote:
If they are different products (eg splicing variants), they are different proteins. If they are different proteins, they should have different transcript IDs.
Your PlasmoDB example calls them "Alternate Product Descriptions". UniProt calls them "Alternative names". That both sounds like they are the same product, but somehow that has picked up an alternate name. I am already listing that as "alternative names", e.g. on https://www.genedb.org/#/gene/PF3D7_0423800 https://www.genedb.org/#/gene/PF3D7_0423800 If that is insufficient, please let me know what, exactly, these different "products" are, in slightly more scientific terms.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_magnusmanske_genedb-5Fwd_issues_5-23issuecomment-2D467848493&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=HaPWOktwrzeQm0kXrz1Zq-oby98ICuTZLdb32yW3JXw&s=Nq24jSfYL_8QVRRn91feZKiMDQMdx7gzoAjLz2A5GvQ&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AX76FOe8RFhTJe-5FqZLsEgSF8tWQhCiSkks5vRn3WgaJpZM4W4hon&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=HaPWOktwrzeQm0kXrz1Zq-oby98ICuTZLdb32yW3JXw&s=ruqsY9eRYR6cB6T5SRT4MlntoUlCanLpo4_dwlM_NzM&e=.
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Here are two examples with multiple products: http://www.genedb.org/gene/PF3D7_0423800
http://www.genedb.org/gene/PF3D7_0701800