pombase / curation

PomBase curation
7 stars 0 forks source link

Move modifications annotations out of the contig files #3602

Open kimrutherford opened 1 year ago

kimrutherford commented 1 year ago

They will be in the modifications file here: https://curation.pombase.org/dumps/latest_build/ We should copy those to a TSV file then remove them from the contig files.

Mostly they will be "Inferred from Sequence or Structural Similarity".

kimrutherford commented 1 year ago

Mostly they will be "Inferred from Sequence or Structural Similarity".

They seem to be mostly IDA?

FT                   /controlled_curation="term=modification, phosphorylated;
FT                   db_xref=PMID:19547744; evidence=IDA; cv=pt_mod;
FT                   date=20100606"
kimrutherford commented 1 year ago

I've created a file containing the legacy modifications:

pombe-embl/supporting_files/legacy_modifications_from_contigs.tsv

Should I go ahead and remove the modifications from the contig files?

ValWood commented 1 year ago

Yes go for it. I rarely edit those files anyway (as you could see I didn't even know the source, of course they are IDA, we rarely predict mofications)

kimrutherford commented 1 year ago

Done!

I've added the legacy modifications file to the nightly load script. I'll double check tomorrow that it's all OK.

kimrutherford commented 1 year ago

Hi Val.

Some modifications aren't loading because they don't have an evidence code. What evidence code can we use for them? Maybe IDA? Here's an example: https://www.pombase.org/gene/SPAC977.07c

kimrutherford commented 1 year ago

What evidence code can we use for them?

We have "Not Recorded" and "Unknown" as options if that makes sense.

ValWood commented 1 year ago

for PMID:17870620 this was some sort of computational, let's use TAS. Then let me know what is left (most are probably these glycosylation sites)

kimrutherford commented 1 year ago

for PMID:17870620 this was some sort of computational, let's use TAS.

Now that I've looked closer there are some modifications from PMID:17870620 that are "Inferred from Sequence or Structural Similarity" so maybe that's the evidence code to use?

Then let me know what is left (most are probably these glycosylation sites)

I've changed the missing evidence codes to "Unknown" for now so they can load and checked in the file:

pombe-embl/supporting_files/legacy_modifications_from_contigs.tsv

So you can find them by searching for "Unknown".

ValWood commented 1 year ago

OK, we can use ISS for the glycosylation sites. This isn't GO so it won't cause a violation for no "with" field

kimrutherford commented 1 year ago

OK, we can use ISS for the glycosylation sites.

I've changed the annotations for PMID:17870620 to ISS.

Some modifications aren't loading because they don't have an evidence code.

Now that the missing evidence codes have been changed to Unknown, all the legacy modifications load OK. There are some warnings about dates: https://curation.pombase.org/dumps/builds/pombase-build-2023-10-28/logs/log.2023-10-27-22-09-48.legacy_modifications_from_contigs

kimrutherford commented 1 year ago

Now that the missing evidence codes have been changed to Unknown, all the legacy modifications load OK.

I closed this issue then had second thoughts. :-)

Do the modifications with unknown/missing evidence need reviewing? They are loading and displaying without a problem (examples: https://www.pombase.org/term/MOD:00689).

ValWood commented 1 year ago

Yes they should be reviewed eventually, will move to curation tracker