magnusmanske / genedb_wd

An interface implementation for GeneDB using Wikidata
Apache License 2.0
1 stars 0 forks source link

Check multiple product handling #5

Closed magnusmanske closed 5 years ago

uliboehme commented 5 years ago

Here are two examples with multiple products: http://www.genedb.org/gene/PF3D7_0423800 screen shot 2019-01-07 at 08 08 21

http://www.genedb.org/gene/PF3D7_0701800 screen shot 2019-01-07 at 08 08 45

magnusmanske commented 5 years ago

Example has only one mRNA in the GFF file:

Pf3D7_07_v3     chado   CDS     75918   76844   .       -       0       ID=PF3D7_0701800.1:exon:1;Parent=PF3D7_0701800.1
Pf3D7_07_v3     chado   gene    75918   77055   .       -       .       ID=PF3D7_0701800;Name=RIF;previous_systematic_id=PF07_0003;synonym=RIF-A
Pf3D7_07_v3     chado   mRNA    75918   77055   .       -       .       ID=PF3D7_0701800.1;comment=rif (repetitive interspersed family) genes were originally identified by Weber%3B they encode clonally variant RIFIN proteins%2C which are likely expressed on the infected
Pf3D7_07_v3     chado   polypeptide     75918   77055   .       -       .       ID=PF3D7_0701800.1:pep;Derives_from=PF3D7_0701800.1
Pf3D7_07_v3     chado   CDS     76987   77055   .       -       0       ID=PF3D7_0701800.1:exon:2;Parent=PF3D7_0701800.1

There is mention of PIR protein in there, but not sure how to extract.

uliboehme commented 5 years ago

Hi Magnus,

These are several products on the same mRNA.

For PF3D7_0423800, this is how it looks in the GFF file. The preferred product is cysteine-rich protective antigen.

product=term%3Dcysteine-rich protective antigen%3Bdb_xref%3DPMID:22593616%3Bevidence%3DInferred from Direct Assay,rank%3D1%3Bterm%3DRH5-Ripr membrane anchoring protein%3Bdb_xref%3DPMID:25583518%3Bevidence%3DInferred from Direct Assay

For the rifin example, this is how it is shown in the GFF file. Rifin is the preferred product.

rifin%3Bdb_xref%3DPMID:25751816%3Bevidence%3DInferred from Direct Assay,rank%3D1%3Bterm%3DPIR protein

Please let me know if there are any questions.

Thanks, Uli

magnusmanske commented 5 years ago

I think this is the same as #18 I create one Wikidata item per gene, and one per mRNA (or protein). I can't really sub-divide the protein item any further into products, and the products don't seem to have different IDs, but both share the mRNA ID?

uliboehme commented 5 years ago

These are alternative products for the same protein. They share the mRNA ID.

Here is an example on how this is displayed on PlasmoDB: http://plasmodb.org/plasmo/app/record/gene/PF3D7_0423800#AlternateProducts

UniProt shows them as alternative names. Here is an example: https://www.uniprot.org/uniprot/Q8I5R7

Would be great if there is a possibility to show those alternative products also on Wikidata.

On 26 Feb 2019, at 15:33, Magnus Manske notifications@github.com wrote:

I think this is the same as #18 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_magnusmanske_genedb-5Fwd_issues_18&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=Z5FSUBUjhzkOPU65fqvlA1b_BWi1CN4lDKsEGW-riyo&s=Rh8WQ2TT-JABXK5-FPFVQTddQQMjC0fgi9Ci2F3Z-A8&e= I create one Wikidata item per gene, and one per mRNA (or protein). I can't really sub-divide the protein item any further into products, and the products don't seem to have different IDs, but both share the mRNA ID?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_magnusmanske_genedb-5Fwd_issues_5-23issuecomment-2D467485580&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=Z5FSUBUjhzkOPU65fqvlA1b_BWi1CN4lDKsEGW-riyo&s=rLFOWhvFzMA69njTbtXgP4GvoCrssL1MW-W1zB8DCpY&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AX76FPWDnzJvmPb00pR762hMPjO5tpQ0ks5vRVPkgaJpZM4W4hon&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=Z5FSUBUjhzkOPU65fqvlA1b_BWi1CN4lDKsEGW-riyo&s=-gjT1HeFbGCtYu6_g7ysLAFN4cP0K7XNMwbPk6OkKew&e=.

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

magnusmanske commented 5 years ago

If they are different products (eg splicing variants), they are different proteins. If they are different proteins, they should have different transcript IDs.

Your PlasmoDB example calls them "Alternate Product Descriptions". UniProt calls them "Alternative names". That both sounds like they are the same product, but somehow that has picked up an alternate name. I am already listing that as "alternative names", e.g. on https://www.genedb.org/#/gene/PF3D7_0423800

If that is insufficient, please let me know what, exactly, these different "products" are, in slightly more scientific terms.

uliboehme commented 5 years ago

Hi Magnus,

Thanks for the explanation. Sorry about the confusion. I did not notice that this is listed as alternative name in the protein section. Sorry, I’ve only checked the top of the gene page.

The ticket can definitely be closed. Thanks!

Uli

On 27 Feb 2019, at 12:45, Magnus Manske notifications@github.com wrote:

If they are different products (eg splicing variants), they are different proteins. If they are different proteins, they should have different transcript IDs.

Your PlasmoDB example calls them "Alternate Product Descriptions". UniProt calls them "Alternative names". That both sounds like they are the same product, but somehow that has picked up an alternate name. I am already listing that as "alternative names", e.g. on https://www.genedb.org/#/gene/PF3D7_0423800 https://www.genedb.org/#/gene/PF3D7_0423800 If that is insufficient, please let me know what, exactly, these different "products" are, in slightly more scientific terms.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_magnusmanske_genedb-5Fwd_issues_5-23issuecomment-2D467848493&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=HaPWOktwrzeQm0kXrz1Zq-oby98ICuTZLdb32yW3JXw&s=Nq24jSfYL_8QVRRn91feZKiMDQMdx7gzoAjLz2A5GvQ&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AX76FOe8RFhTJe-5FqZLsEgSF8tWQhCiSkks5vRn3WgaJpZM4W4hon&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=af4zXwQtKNLvVIiDVaurRw&m=HaPWOktwrzeQm0kXrz1Zq-oby98ICuTZLdb32yW3JXw&s=ruqsY9eRYR6cB6T5SRT4MlntoUlCanLpo4_dwlM_NzM&e=.

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.