to do list for alternative transcripts/isoforms

ValWood commented 2 years ago

HOW Syntax is to create features with .1, .2 to the CDS (and UTRs if features with alternative forms are present).

TO DO

[ ] Locate all of the existing tickets and link them here.
[ ] We need rules for when we curate alternative forms.
we don't want things for example with a slightly longer or shorter UTR because transcripts are so heterogeneous, so probably only when there is functional data to attach to the different forms (I.e different expression at a different time, different activity or phenotype etc).
Some things seem like alternative transcripts but are described as long none coding RNAs because no protein is produced - we have these curated differently. examples include -- tco1 curated as alternative transcript https://github.com/pombase/curation/issues/3386 -- pho1 https://www.pombase.org/gene/SPNCRNA.1712 curated as "translationally silent transcript from pho1 locus"
[ ] We need to decide if alleles should always be described in reference to the canonical form (this might be sensible since most alleles will apply to both isoforms with different coordinates (to discuss , might not always be possible).
[x] Which tasks are outstanding for Canto? (can we add isoforms to GO and phenotype annotations?) (we still capture GO annotation for alternative isoforms in artemis. I am not sure if we have ever needed to curate phenotypes on alternative transcripts? found: https://github.com/pombase/canto/issues/2351
[x] Which tasks are outstanding for Chado? (all done)
[ ] Which tasks are outstanding for website display? i.e how did we decide to display alternative GO annotations? alternative translations etc. This is the only website related ticket I can find: https://github.com/pombase/curation/issues/3616 (and this is more general)
[x] List of all known alternatively spliced genes, so we can assess if there are any issues. https://www.pombase.org/results/from/id/eb7bb2f5-a170-486e-a9ef-23d7fed991b2 (check against https://github.com/pombase/curation/issues/61)
[ ] Documentation, how to curate isoforms (and their alleles and associated annotations).
[ ] Outreach, communicate how we curate alternative transcripts, how to access all alternative transcripts etc
[x] @kimrutherford provide a list of all annotated alternative forms

ValWood commented 2 years ago

[ ] @kimrutherford provide a list of all annotated alternative forms

kimrutherford commented 2 years ago

provide a list of all annotated alternative forms

The easiest way to find them is to look in the contig files and search for "/GO=". The only GO annotation left in contig files is for genes with multiple transcripts.

Which tasks are outstanding for Canto?

That's done as far as I know, unless we want a fancier interface.

Which tasks are outstanding for Chado?

No changes are needed there.

Which tasks are outstanding for website display?

That's done as far as I know, unless changes are needed.

But there may be a problem with GOA annotation taking precedence. For example: https://www.pombase.org/gene/SPCC548.03c should show this transcript specific annotation:

FT                   /systematic_id="SPCC548.03c.1"
FT                   /GO="aspect=P; term=meiotic drive; GOid=GO:0110134;
FT                   evidence=ISM; db_xref=PMID:28631612;
FT                   with=InterPro:IPR004982; date=20170627"

but it shows one with evidence IMP instead:

kimrutherford commented 2 years ago

The easiest way to find them is to look in the contig files and search for "/GO=".

Sorry, that's not helpful. I shouldn't comment so soon after waking up. There are annotations from Canto too.

provide a list of all annotated alternative forms

Which types of annotations are you interested in? Just GO? If so you can do a query the genes like this: https://www.pombase.org/results/from/id/6a09a7ea-259c-4b28-a1f4-542dbfe812f1

But there may be a problem with GOA annotation taking precedence.

Sorry, I got that wrong too. The annotations that override the SPCC548.03c.1 annotation from the contig are from Canto: https://curation.pombase.org/pombe/curs/06d77d7ac3f77bbc/ro/

But I can't work out why the cellular component annotations from that Canto session aren't showing up on the main site. I'm digging into that now.

kimrutherford commented 2 years ago

So this list?: https://www.pombase.org/results/from/id/eb7bb2f5-a170-486e-a9ef-23d7fed991b2

ValWood commented 2 years ago

~So remind me, if the Canto changes are done, why do we still need to do isoform specific annotation in Artemis - I can't remember?~ ignore

ValWood commented 2 years ago

Ignore previous question. There will be NAS,ISS etc. But can't we do these in the legacy GO file now with column 17 extensions fo specific isoforms?

kimrutherford commented 2 years ago

But can't we do these in the legacy GO file now with column 17 extensions fo specific isoforms?

We should move to that but for now the GAF loader doesn't support isoform IDs in column 17. It only support PRO IDs.

I'm not sure how we'd represent it in official GAF format. It's not clear from here: http://geneontology.org/docs/go-annotation-file-gaf-format-2.2/#gene-product-form-id-column-17

But for PomBase only use we can put whatever we like in that column.

ValWood commented 2 years ago

I wonder if that is intentional. I have asked on Slack. If we are the provider we should use our IDs.

kimrutherford commented 2 years ago

I have asked on Slack.

Thanks. We'll need to know what to do for GPAD/GPI files too.

kimrutherford commented 2 years ago

I'm not sure how we'd represent it in official GAF format. It's not clear from here: http://geneontology.org/docs/go-annotation-file-gaf-format-2.2/#gene-product-form-id-column-17

I misread this to mean we'd have to use UniProt, NP numbers or PRO IDs: "When the Gene Product Form ID is filled with a protein identifier, the value in DB Object Type (column 12) must be protein. Protein identifiers can include UniProtKB accession numbers, NCBI NP identifiers or Protein Ontology (PRO) identifiers."

But I think we could put "protein" in column 12 and "PomBase:SPCC548.03c.1:pep" in column 17.

ValWood commented 2 years ago

Yes, I think we can use our own IDs.

kimrutherford commented 2 years ago

Yes, I think we can use our own IDs.

I guess that's not super urgent?

ValWood commented 2 years ago

noope not at all.

ValWood commented 2 years ago

replaced by https://github.com/pombase/curation/issues/3341

pombase / curation

to do list for alternative transcripts/isoforms #3310