Closed ValWood closed 11 months ago
Applies to
Systematic ID Gene name Product description SPCC622.08c hta1 histone H2A alpha SPAC19G12.06c hta2 histone H2A beta SPCC622.09 htb1 histone H2B Htb1 SPAC1834.04 hht1 histone H3 h3.1 SPBC8D2.04 hht2 histone H3 h3.2 SPAC1834.03c hhf1 histone H4 h4.1 SPBC8D2.03c hhf2 histone H4 h4.2 SPBC1105.12 hhf3 histone H4 h4.3 SPBC1105.17 cnp1 centromere-specific histone H3 CENP-A SPBC11B10.10c pht1 histone H2A variant H2A.Z Pht1 SPBC1105.11c hht3 histone H3 h3.3
Applies to
Do these have a something like a GO term in common so we don't need to maintain a gene list in the configuration file?
extended the "applies to" list with Jo's additions
Do these have a something like a GO term in common so we don't need to maintain a gene list in the configuration file?
https://www.pombase.org/results/from/id/593d5a9e-c876-467e-a531-336791ef7e8b
Unfortunately not -these are described variously as adapters or structural molecules. There is not even a protein family that is specific for the group because this is only a subset of all histone faults.
Could we add some sort of /controlled_curation
to annotate these genes? That way the annotation will end up in Chado and the web code will be identify the genes.
Yes, I can add /controlled_curation="histone"
or does it also need to have a "type"?
or does it also need to have a "type"?
It's been so long, I can't remember. I'll refresh my memory and let you know.
But now I think about it, I wonder if /SO=SO:0000418
would be better?
this is for signa peptide, not modified histone? but PRO might have a grouping, will check
nope PRO does not have a grouping (it would be a bit odd anyway)
this is for signa peptide, not modified histone?
Sorry! Please ignore me. :-) I was think about this is at the same time: pombase/website#2115
I thought you were. They are both removal of N-terminal regions, but a bit different ;)
I could find, or request the PRO_ID for each one and add controlled_curation=display(PRO:xxx)
/controlled_curation="histone
I think something like this:
/controlled_curation="term=warning, histone"
K14(K15) processed/preprocessed.
Where does that need to be shown on the web site? Sorry, I don't understand histones.
I could find, or request the PRO_ID for each one and add controlled_curation=display(PRO:xxx)
I'm not sure about that but I don't really understand what's needed.
Action for kmr: make a Chado check that we always have 11 histones - 11 annotations to GO:0000786 nucleosome
Good, but I think it needs to be visible in the full view as well as show details.
It will be. That's what the two images show. The code isn't finished yet so my desktop version is a bit dodgy. I took the screenshot while it was briefly doing the right thing for hht1. :-)
Hi Val.
On the documentation page for the modifications section (https://www.pombase.org/documentation/gene-page-modifications) it says:
Note: for histones, residue numbering assumes that the initiator methionine is removed.
Is that correct? If so, I think I'm misunderstanding things.
I think that is because the documentation predates the manus script,and that previously the instructions were to represent histone modifications using the histone code standard. I'm not sure whether everybody adhered to that guidline, because presumably they would need to manually edit their mass spec output to make the histones match (unless that mass spec processing does this automatically, this seems unlikely because even uniprot reports the modifications on the unprocessed version https://www.uniprot.org/uniprotkb/P09988/entry#ptm_processing).
It might be useful to know how many manus script 'fixed' to the non-modified form @manu do you know this?
But I think we probably just need to update the modification now to say that histones should be reported in the unprocessed form. Basically, we changed the way we do this to make it the same for every protein, but weren't aware of the existing instructions for histones.
We will also need to correct
The Residue column indicates the position modified. For protein modifications, use one-letter amino acid code. Multiple entries are allowed, but only for cases where two or more of the same modification are known to be present at the same time. Separate entries with commas (e.g. S72,T85). Position numbering should reflect the current sequence data in PomBase. Please refer to the Gene Coordinate Changes page to ensure that your residue position entries are up to date. Also note that histones are conventionally numbered assuming the initiator methionine is removed (i.e. every position in the mature protein is numbered, and is 1 less than the apparent numbering predicted by translating the ORF).
this section isn't correct, we don't conjoin because this will make modifications impossible to collapse and it isn't super informative since it's biased for close residues which are likely to be on the same peptide fragment. Also, we report the phase when known so we can get modifications that co-occur this way.
I can rewrite a shorter version tomorrow.
The residues are displayed correctly for histones now. I had some bugs to fix when displaying on pages that have a mixture of histones and non-histones like: https://www.pombase.org/term/MOD:00408
That's now fixed but let me know if you see any problems.
We talked about being more explicit in the detailed display (eg."K56(K57)` processed(preprocessed)"). Should we do that? Here's how it looks in the test version on my desktop:
For now I've removed "Note: for histones, residue numbering assumes that the initiator methionine is removed." from the docs while work on it.
We talked about being more explicit in the detailed display (eg."K56(K57)` processed(preprocessed)"). Should we do that? Here's how it looks in the test version on my desktop:
It doesn't hurt, but I think most biologists will understand without. Put it in for now as there is space...
OK, I'll added that. The main site will have the change in a little while.
Add text at the top in the "Notes" section.
We have a QC pipeline in place to check that the modified residues match the current protein sequence coordinates. For most proteins where the protein sequence coordinates have changed, we will be abo to automatically "lift over" to the current sequence residue numbering.
Change text Also note that histones are conventionally numbered assuming the initiator methionine is removed (i.e. every position in the mature protein is numbered, and is 1 less than the apparent numbering predicted by translating the ORF). to Histones should be represented using the unprocessed protein sequence coordinates, not the processed coordinates conventionally used to describe histones. Histone modifications will be represented on the gene pages as K4(K5) processed(preprocessed), but our checking pipeline will expect unmodified forms.
Edited text above. @kimrutherford can you make this change and ten this ticket can close
can you make this change and ten this ticket can close
Excellent, thanks. I've made that text change.
Histone alles and modifications are universally described using the processed (- initMet) for the histones.
For alleles we do this:
hht3-K56R(K57R aa)
The name follows the community usage, and the description matches the underlying protein sequence. We use the underlying protein sequence to check that the alleles and modifications conform with the sequence in our QC pipeline without adding a special case for histones.
At the moment we Display modifications using only the preprocessed form which will be very confusing for end users. We would like to change the display so that the universal nomenclature is presented with the processed form in parentheses
K14(K15) processed/preprocessed.
I will add a list of the histones that this applies to.