pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

use universal modification syntax for histones. #1108

Closed ValWood closed 1 year ago

ValWood commented 1 year ago

This is more problematic than alleles, because at least for alleles we can name universally but have the description standardized. What should we do for modifications though?

Everyone refers to modifications by their methionine removed proition.

For example see this new paper:

Screenshot 2023-08-26 at 21 03 17

https://www.pombase.org/gene/SPAC1834.04 but we have S58

Screenshot 2023-08-26 at 21 03 10
ValWood commented 1 year ago

I also wonder if all of the errors are reporting here: https://curation.pombase.org/dumps/latest_build/logs/log.2023-08-26-21-47-51.chado_checks.modification_on_wrong_residue

because hht1 has K5 reported as both K4 and K5. Whichever system we are using one should be reported as incorrect.

@manulera @kimrutherford

(Apologies for conflating 2 issues but we can discuss next week and split onto the correct trackers)

manulera commented 1 year ago

because hht1 has K5 reported as both K4 and K5

Hi @ValWood, regarding the K4 / K5. This is from the reference that has PB_REF:0000001, and therefore was not fixed by kim's script that fed my changes. I am not sure we ruled out where these annotations come from / how to fix them. We should report K5, I think.

manulera commented 1 year ago

This is more problematic than alleles

Not sure what to say. SGD uses the "normal" coordinates (counting the methionine). https://www.yeastgenome.org/locus/S000000214/protein#phosphorylation_strain

I think histone researchers will probably realise that we are counting the methionine, while people that are not working on histones might not notice if we don't, and be confused about it. I think for the export dataset, we should definetely use the normal index. In the website, maybe we can put a warning at the top of the modification section for histone genes?

ValWood commented 1 year ago

OK, I agree to use the normal index.

I think a warning at the top of the modification section for histones is a good idea.

The inferred modifications are in the contig files. There are not many (60) and most are GPI anchors or farnesylation sites. Most do not have residues reported. We are slowly moving things out of the contigs into csv files so @kimrutherford could move this out at some point. In the meantime, I have fixed the problem ones that I am aware of (on two of the histones). Let me know if there are others.

ValWood commented 1 year ago

Reopen if there are more for me to fix