Open kimrutherford opened 1 year ago
Mostly they will be "Inferred from Sequence or Structural Similarity".
They seem to be mostly IDA?
FT /controlled_curation="term=modification, phosphorylated;
FT db_xref=PMID:19547744; evidence=IDA; cv=pt_mod;
FT date=20100606"
I've created a file containing the legacy modifications:
pombe-embl/supporting_files/legacy_modifications_from_contigs.tsv
Should I go ahead and remove the modifications from the contig files?
Yes go for it. I rarely edit those files anyway (as you could see I didn't even know the source, of course they are IDA, we rarely predict mofications)
Done!
I've added the legacy modifications file to the nightly load script. I'll double check tomorrow that it's all OK.
Hi Val.
Some modifications aren't loading because they don't have an evidence code. What evidence code can we use for them? Maybe IDA? Here's an example: https://www.pombase.org/gene/SPAC977.07c
What evidence code can we use for them?
We have "Not Recorded" and "Unknown" as options if that makes sense.
for PMID:17870620 this was some sort of computational, let's use TAS. Then let me know what is left (most are probably these glycosylation sites)
for PMID:17870620 this was some sort of computational, let's use TAS.
Now that I've looked closer there are some modifications from PMID:17870620 that are "Inferred from Sequence or Structural Similarity" so maybe that's the evidence code to use?
Then let me know what is left (most are probably these glycosylation sites)
I've changed the missing evidence codes to "Unknown" for now so they can load and checked in the file:
pombe-embl/supporting_files/legacy_modifications_from_contigs.tsv
So you can find them by searching for "Unknown".
OK, we can use ISS for the glycosylation sites. This isn't GO so it won't cause a violation for no "with" field
OK, we can use ISS for the glycosylation sites.
I've changed the annotations for PMID:17870620 to ISS.
Some modifications aren't loading because they don't have an evidence code.
Now that the missing evidence codes have been changed to Unknown
, all the legacy modifications load OK. There are some warnings about dates:
https://curation.pombase.org/dumps/builds/pombase-build-2023-10-28/logs/log.2023-10-27-22-09-48.legacy_modifications_from_contigs
Now that the missing evidence codes have been changed to Unknown, all the legacy modifications load OK.
I closed this issue then had second thoughts. :-)
Do the modifications with unknown/missing evidence need reviewing? They are loading and displaying without a problem (examples: https://www.pombase.org/term/MOD:00689).
Yes they should be reviewed eventually, will move to curation tracker
They will be in the modifications file here: https://curation.pombase.org/dumps/latest_build/ We should copy those to a TSV file then remove them from the contig files.
Mostly they will be "Inferred from Sequence or Structural Similarity".