pombase / curation

PomBase curation
7 stars 0 forks source link

to do list for alternative transcripts/isoforms #3310

Closed ValWood closed 2 years ago

ValWood commented 2 years ago

HOW Syntax is to create features with .1, .2 to the CDS (and UTRs if features with alternative forms are present).

TO DO

ValWood commented 2 years ago
kimrutherford commented 2 years ago

provide a list of all annotated alternative forms

The easiest way to find them is to look in the contig files and search for "/GO=". The only GO annotation left in contig files is for genes with multiple transcripts.

Which tasks are outstanding for Canto?

That's done as far as I know, unless we want a fancier interface.

Which tasks are outstanding for Chado?

No changes are needed there.

Which tasks are outstanding for website display?

That's done as far as I know, unless changes are needed.

But there may be a problem with GOA annotation taking precedence. For example: https://www.pombase.org/gene/SPCC548.03c should show this transcript specific annotation:

FT                   /systematic_id="SPCC548.03c.1"
FT                   /GO="aspect=P; term=meiotic drive; GOid=GO:0110134;
FT                   evidence=ISM; db_xref=PMID:28631612;
FT                   with=InterPro:IPR004982; date=20170627"

but it shows one with evidence IMP instead: image

kimrutherford commented 2 years ago

The easiest way to find them is to look in the contig files and search for "/GO=".

Sorry, that's not helpful. I shouldn't comment so soon after waking up. There are annotations from Canto too.

provide a list of all annotated alternative forms

Which types of annotations are you interested in? Just GO? If so you can do a query the genes like this: https://www.pombase.org/results/from/id/6a09a7ea-259c-4b28-a1f4-542dbfe812f1

But there may be a problem with GOA annotation taking precedence.

Sorry, I got that wrong too. The annotations that override the SPCC548.03c.1 annotation from the contig are from Canto: https://curation.pombase.org/pombe/curs/06d77d7ac3f77bbc/ro/

But I can't work out why the cellular component annotations from that Canto session aren't showing up on the main site. I'm digging into that now.

kimrutherford commented 2 years ago

So this list?: https://www.pombase.org/results/from/id/eb7bb2f5-a170-486e-a9ef-23d7fed991b2

ValWood commented 2 years ago

~So remind me, if the Canto changes are done, why do we still need to do isoform specific annotation in Artemis - I can't remember?~ ignore

ValWood commented 2 years ago

Ignore previous question. There will be NAS,ISS etc. But can't we do these in the legacy GO file now with column 17 extensions fo specific isoforms?

kimrutherford commented 2 years ago

But can't we do these in the legacy GO file now with column 17 extensions fo specific isoforms?

We should move to that but for now the GAF loader doesn't support isoform IDs in column 17. It only support PRO IDs.

I'm not sure how we'd represent it in official GAF format. It's not clear from here: http://geneontology.org/docs/go-annotation-file-gaf-format-2.2/#gene-product-form-id-column-17

But for PomBase only use we can put whatever we like in that column.

ValWood commented 2 years ago

I wonder if that is intentional. I have asked on Slack. If we are the provider we should use our IDs.

kimrutherford commented 2 years ago

I have asked on Slack.

Thanks. We'll need to know what to do for GPAD/GPI files too.

kimrutherford commented 2 years ago

I'm not sure how we'd represent it in official GAF format. It's not clear from here: http://geneontology.org/docs/go-annotation-file-gaf-format-2.2/#gene-product-form-id-column-17

I misread this to mean we'd have to use UniProt, NP numbers or PRO IDs: "When the Gene Product Form ID is filled with a protein identifier, the value in DB Object Type (column 12) must be protein. Protein identifiers can include UniProtKB accession numbers, NCBI NP identifiers or Protein Ontology (PRO) identifiers."

But I think we could put "protein" in column 12 and "PomBase:SPCC548.03c.1:pep" in column 17.

ValWood commented 2 years ago

Yes, I think we can use our own IDs.

kimrutherford commented 2 years ago

Yes, I think we can use our own IDs.

I guess that's not super urgent?

ValWood commented 2 years ago

noope not at all.

ValWood commented 2 years ago

replaced by https://github.com/pombase/curation/issues/3341