Closed ValWood closed 1 year ago
Progress so far:
https://desktop.kmr.nz/gene/SPBC19C2.09
Looking good!
Modifications!
Should we put the allele/variants and modifications at the top since they're important?:
Also are "Variants" and "Modifications" OK as labels?
RCSB use capitals for the track labels. Should we do the same to make them stand out?:
We should cite: Joan Segura, Yana Rose, John Westbrook, Stephen K Burley, Jose M Duarte. RCSB Protein Data Bank 1D tools and services, Bioinformatics, 2020; https://doi.org/10.1093/bioinformatics/btaa1012
I don't think it is working on your desktop right now. Hopefully you went to bed. I'm leaning towards the capitalization. @manulera what do you think?
Good progress!
The example for multiple modification types for a single residue was hht1 I will try to find one with more...
I also agree to switch the order.
I don't think it is working on your desktop right now.
While I'm changing things it often breaks or goes offline. It should be working in the morning. I'm off to bed now so I'll be turning the desktop off.
I've added Pfam families, but since there's on mouse-over details, it's not very useful. I plan to add mouse-overs tomorrow.
I also agree to switch the order.
Done.
The example for multiple modification types for a single residue was hht1
OK, thanks. That will be good for testing tomorrow:
Very nice!
I've added mouse overs on my desktop version: https://desktop.kmr.nz/gene/SPAC1834.04
The text is very minimal at the moment. What should be displayed for variants and modifications? I think it will look best if we don't have more than two lines of text, if possible.
I've now added the protein feature view to the dev site so you don't need to rely on my machine being online: http://dev.pombase.kmr.nz/gene/SPBC19C2.09
It looks very nice. We. can discuss the text. This is soooo useful.
Some random thoughts which should probably be in new tickets i) we will need to deal with the histone special case for modifications, histones residues are always referred to (numbered) after the methionine has been removed in the mature form (check out hht1 lysine K14). ii) special case 2 CTD domain residues in rpb1 and spt5 (will affect variants, but for rpb1 I see we also have modified residue CTD_S5 removed by etc
Existing Page Section It will be odd to have 2 sections named "protein features" so I suggest that before release: i) rename the 'other' protein feature section to 'protein domains' and ii) split out the protein properties into their own section (we can discuss this, we might be able to ditch and just add charge back to the top matter)
Linking to the structure viewer- bear in mind that many of the pdb entries are fragments (e.g. dcr1), so for these we will need to use alpha fold when mapping residues.
This is all so good. Especially with all the corrected alleles and modifications @manulera is doing we can be confident about the displays being correct.
I think we should rename the "Variants" row as "mutants" (otherwise for non pombe people they might assume natural variation)
Some suggestions for the "click through" version.
Wow! Looks great! I would have loved it back in the day. Here are some comments:
i) we will need to deal with the histone special case for modifications, histones residues are always referred to (numbered) after the methionine has been removed in the mature form (check out hht1 lysine K14).
This should be OK if using the allele descriptions, since they are corrected for histones, even if the name is different.
Should we put the allele/variants and modifications at the top since they're important?:
I also agree with Val that the order is better with variants and modifs on top
ii) special case 2 CTD domain residues in rpb1 and spt5 (will affect variants, but for rpb1 I see we also have modified residue CTD_S5 removed by etc
For the CTD, let's wait a little bit until I finish with the allele pipeline, and I think we should get a solution
This should be OK if using the allele descriptions, since they are corrected for histones, even if the name is different.
I should have been clearer. The alleles already render OK. But the modifications use actual sequence coordinates and so are 'off by one' A simple shift in the affected genes should fix it. This is K14 modification on hht1, displays on residue 13:
@ValWood I see, but those should be ammended at some point as well (I think). For now we have only fixed the ones in the HTP files, not the ones stored in Canto. We can discuss this on the next meeting, but I think it would be better to store them right in Canto and have them displayed differently on the "modification" section of the gene page, than keep storing the offsetted coordinates.
I'm leaning towards the capitalization.
I've done that so we can see how it looks.
I also agree with Val that the order is better with variants and modifs on top
That's done now.
I think we should rename the Variants row as mutants
Done.
Single sequence region mutants (i.e A123G, OR A234G,A235G OR 123-223 delta) with associated phenotypes
How should the phenotypes look? Some alleles have multiple associated phenotypes.
123-223 delta
Are those the alleles with type "partial_amino_acid_deletion" and descriptions like "ccq1(131-441)"?
How should the phenotypes look? Some alleles have multiple associated phenotypes.
I think that's going to be a bit tricky, specially for famous alleles, like cdc25-22 and so, which will have many many phenotypes. If there is an ontologic way to restrict to the high order terms that would be the best, I think (some kind of slim, but not sure that's possible).
However, I am not sure variant -> phenotype is the most meaningful link. Probably the user would like to see which alleles give a certain phenotype, rather than what a particular sequence modification does when hovering over it. This does not seem possible/easy on the gene page. However, if we are still thinking of a separate view like the one in the PDB, in which the sequence opens in a different window and you also have the structure, then we could have a scrollable list of all phenotypes like in the gene page, where the user can pick some and only the alleles that give those phenotypes can be displayed. In the same way, it could be that when you click on a particular modification, a list with all phenotypes associated with the modification is rendered below the graph (clearly this cannot be a tooltip when mouseover).
I thought what would be really nice instead of a phenotype list is to have some ontology tree with only the FYPO terms of that gene so you can also pick high-order terms, but then I had a look at the cut
phenotype tree in OLS and realised that it would look atrocious (https://www.ebi.ac.uk/ols4/ontologies/fypo/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FFYPO_0000229). That tree is for one term only! Admitedly, perhaps one of the most connected, and famous phenotypes in pombe, but still...
Are those the alleles with type "partial_amino_acid_deletion" and descriptions like "ccq1(131-441)"?
Yes, in principle the descriptions of partial_amino_acid_deletion
alleles always contain the missing residues, while their names may contain the missing or kept amino acids, depending on the authors. Say, protein ase1 has 100 aminoacids, they may name an allele missing aa 60-100 as ase1(1-59) -what's kept- or ase1delta(60-100) -what's missing-. In theory, in the description there should only be the missing ones, independently of what the name is, but I am sure there are some errors out there.
Thinking of a way to show 'conjoined' alleles (and possibly deleted regions) I noticed that some of the Pfam domains had 'connectors" maybe this can be used?
I tried that but unfortunately it doesn't work well in every case. It's good on the cdc10 page:
but not so friendly on the mrc1 page:
because of allele descriptions like "T121A,T327A,T513A,S572A,S599A,S604A,S614A,T634A,S637A,T645A,T653A,S938A,T965A,S1000A"
Are those the alleles with type "partial_amino_acid_deletion" and descriptions like "ccq1(131-441)"?
Yes, in principle the descriptions of partial_amino_acid_deletion alleles always contain the missing residues,
Thanks Manu.
What does it mean if a partial_amino_acid_deletion has a description like "W316*"?
Also I found one that seems a bit strange. It's a partial_amino_acid_deletion for an RNA gene?: nc-tco1-LΔ::ura4+(-395-+146)
Are those the alleles with type "partial_amino_acid_deletion" and descriptions like "ccq1(131-441)"?
I tried adding those alleles and it didn't work out too well in some cases. Below is the diagram for mrc1.
Gene page: https://desktop.kmr.nz/gene/SPAC694.06c Full diagram: https://desktop.kmr.nz/protein_feature_view/widget/SPAC694.06c It will be at https://dev.pombase.kmr.nz/gene/SPAC694.06c on Monday morning.
What does it mean if a partial_amino_acid_deletion has a description like "W316*"?
The nomenclature paper says that the *
is a stop codon, which makes sense. But in that case why doesn't W316*
have the type "amino_acid_mutation"? Is it because the protein gets truncated?
The nomenclature paper says that the is a stop codon, which makes sense. But in that case why doesn't W316 have the type "amino_acid_mutation"? Is it because the protein gets truncated?
Yes, this is a truncated protein. In the past we used to record them as "nonsense_mutation" but we decided to merge with partial aminoacid deletion, since at the product level there is no difference in principle.
So, for the sake of the feature viewer, W136*
is equivalent to 316-XXX
where XXX is the protein length.
Are those the alleles with type "partial_amino_acid_deletion" and descriptions like "ccq1(131-441)"?
I tried adding those alleles and it didn't work out too well in some cases
On some pages it looks very nice though. This is sre1 on the dev server as of this morning: http://dev.pombase.kmr.nz/gene/SPBC19C2.09
Looks great!
I noticed that in the truncation section the "chunks" that are on the same lane do not necessarily belong to the same construct. For example:
All chunks may get distributed to minimise the rows or something, perhaps that's why there are many rows in that example where ends meet perfectly. If there is a way to link two fragments to indicate they belong to the same construct and force them to appear in the same lane, that would be the best (with a thin line in the middle like the disulfide bonds in the PDB display?) not sure if that's possible / documented.
However, I am not sure variant -> phenotype is the most meaningful link. Probably the user would like to see which alleles give a certain phenotype, rather than what a particular sequence modification does when hovering over it. This does not seem possible/easy on the gene page. However, if we are still thinking of a separate view like the one in the PDB, in which the sequence opens in a different window and you also have the structure, then we could have a scrollable list of all phenotypes like in the gene page, where the user can pick some and only the alleles that give those phenotypes can be displayed. In the same way, it could be that when you click on a particular modification, a list with all phenotypes associated with the modification is rendered below the graph (clearly this cannot be a tooltip when mouseover).
This would be very useful. I wonder if it could operate like the filters so you would check specific phenotypes and modifications and the display would reduce to show only those.
I suggest we open new tickets for each outstanding task, or feature request, as this ticket is in danger of becoming difficult to navigate.
Hi @kim the stop codon change was documented here in the news item Curation update - “nonsense mutation” merged into “partial amino acid deletion” I'll close off those comments but let us know if it doesn't make sense
All chunks may get distributed to minimise the rows or something, perhaps that's why there are many rows in that example where ends meet perfectly. If there is a way to link two fragments to indicate they belong to the same construct and force them to appear in the same lane, that would be the best
Here's what that would look like: https://desktop.kmr.nz/protein_feature_view/widget/SPAC694.06c
(Note that the zooming and scrolling doesn't work correctly on that page. I've created a issue about that)
Here's what that would look like:
Yes! That's perfect! That was fast!
Task in new tickets
Protein feature viewer: decide text for mouse overs pombase/website#2068
Protein feature viewer: "partial_amino_acid_deletion" pombase/website#2067
Protein sequence feature viewer show 'conjoined' alleles (and possibly deleted regions) I noticed that some of the Pfam domains had 'connectors" maybe this can be used? pombase/website#2066
Protein sequence feature viewer: Find external active site sources pombase/pombase-chado#1126
protein sequence feature viewer: linking to structures pombase/website#2064
Protein sequence feature viewer: existing page section pombase/website#2063
Protein sequence feature viewer: enable phenotype-> residue in full view pombase/website#2062
The only thing in this ticket is
Also I found one that seems a bit strange. It's a partial_amino_acid_deletion for an RNA gene?: nc-tco1-LΔ::ura4+(-395-+146)
I will fix that one.
Replaces https://github.com/pombase/website/issues/667
Features
Essential
Configurable
Browser features essential
Desirable Ability for user configure active tracks on and off (Calipho has this) Download svg image ? Calipho has this) Ability to link to the equivalent residue in human (I'm not sure how that could happen, it would need to be precomputed from alignments (possibly Panther families could be used for this). This isn't super urgent.
Tracks we would like to display initially 1 Single sequence region mutants (i.e A123G, OR A234G,A235G OR 123-223 delta) with associated phenotypes
Ideally, these regions would display in structure viewer and give a mouse-over pop-up with “details”
Later datatypes Secondly structure regions Hydropathy? Others?
Examples RCSB https://www.rcsb.org/3d-sequence/4OFB?asymId=A Calipho https://www.nextprot.org/entry/NX_P38398/sequence Protista https://www.uniprot.org/uniprotkb/P38398/feature-viewer