Closed ValWood closed 8 months ago
@manulera tagging
Following up on the discussion with @ValWood today. Perhaps for the phenotype annotations it is not so important, but when querying for alleles in Intermine I think it would be good to add a category "promoter" to alleles (NULL if not changed) in addition to "Allele expression".
Overexpression or knockdown are often induced in certain conditions, such as removal/addition of thiamine for nmt1 promoters, but I think the allele description should represent the changes in the DNA / protein. Same is true for ectopic expression (e.g. for inducing expression of meiotic proteins during another phase, or vice versa).
(Maybe this belongs in another issue, but somewhat related to this)
Maybe this belongs in another issue, but somewhat related to this) I think it is good to brainstorm it in this ticket. I can make a clearer ticket once we decide.
So it would be, for example
gene | allele | allele type | alleles description | allele expression | promoter |
---|---|---|---|---|---|
abc1 | abc1-delta | deletion | - | NULL | N/A |
abc1 | abc1+ | wild-type | - | WT | N/A |
abc1 | abc1+ | wild-type | - | overexpression | nmt-blah (4X) |
abc1 | abc1-H34F | amino acid mutation | overexpression | nmt-blah (20X) |
where the promoter field is added a new field associated with overexpression or knockdown (for all other allele types the option will not be required).
we could default existing over/under expression to "unknown", but most times I think the information is captured in the background or allele names
I would migrate these over to the new field, they are mainly the ones in these 2 log files:
https://curation.pombase.org/dumps/latest_build/logs/log.2022-05-10-21-26-54.chado_checks i) Running: PomBase::Check::AlleleNotStartingWithGeneName or ii) Running: PomBase::Check::GenotypeBackgrounds (these are the backgrounds which do not currently match a known mutant)
Hello,
Yes, mostly that, but I think the (4x) or (20x) should go in the allele expression
column, not in the promoter, since the expression levels will depend on experimental conditions for a given promoter.
For example, It is not very common, but sometimes you see that people use nmt41-GFP-gene constructs just to tag proteins in their N-term. They then do the experiments in presence of thiamine (overexpression not induced). These nmt1 promoters are quite leaky, so even in the presence of thiamine they can produce equal or higher levels of protein than the wild type. nmt1 always that I have used it produces an overexpression even when repressed, nmt41 not so sure.
If used for a localisation experiment, people probably will not test whether the expression levels are higher than if GFP-gene was expressed from the normal promoter. I think this is mostly done to quickly tag a protein in the N-term when a C-term tag would be lethal or perturb the behaviour of the protein.
@manulera says:
It was in my todo list to make an issue about this exactly. I think the problem is the ambiguous use of expression levels and alleles. We discussed this briefly before. I think allele description should be only concerned with describing the DNA sequence, and expression level should be a different field since it is conditional.
We could make promoter_changed a type of allele, and we could indicate which promoter has been used when it is known. Also, a change in promoter alone may not lead to a change in expression (it may need to be induced or repressed), but typically if a phenotype has been described it is safe to assume that the experiment has been conducted at conditions where the expression levels change.
You can see what I mean with the following query, you see that all alleles that come up in wild_type are from those that come from changes in expression levels (see that the only values are knockdown and overexpression):
@manulera says And for deletion, I think it would be ok to say 'deletion' in the description.
Let's discuss this after @manulera vacation.
There are 2 possible solutions. We could either extend the expression section to capture the plasmid. Manu's suggestion is more radical (in terms of changes and retrofitting).
Maybe we can think of a better way. Some comments:
We definitely need somewhere sensible to store all of the promoter annotations that are currently in the background, and in allele type "other" but we haven't always recorded the promoter for WT expression changes.
This isn't super urgent (but the longer it is put off the harder the retrofitting will be...) .. a few months won't make too much difference.
So we have a call on this when I come back?
Yep.
We could still change the description field from wild_type to promoter_change.
I think this might be possible. However, I think we have also used when an alteration to the allee is made in a second copy, but the WT copy remains. They did not show up in the query because I think from memory, that although they are displayed as single gene genotypes, they are modelled behind the scenes as multi-gene. We would need to check this.
we don't need https://github.com/pombase/canto/issues/2629 becasue of this proposal.
Another nice example of how annotations are split Ths should be high priority in the New Year because lots of annotations get split up:
https://www.pombase.org/genotype/asp1-H397A-H397A-amino_acid_mutation-expression-not_assayed https://www.pombase.org/genotype/asp1-H397A-H397A-amino_acid_mutation-expression-wild_type_product_level
Here are some examples of expression differences that need differentiating: https://curation.pombase.org/dumps/latest_build/logs/log.2023-04-04-21-42-39.chado_checks.duplicate_allele_descriptions
[for now we won't worry about the copy number type overexpression, we can get these into Chado somehow, they are edge cases)
DECISION:
i) ~An extra option for ectopic expression~ - moved to pombase/canto#2723
ii) Add additonal dialogue If you select Overexpression, Knockdown or Ectopic Pombe gene promoter [text box] -> pick a gene Exogenous promoter [ free text and autocomplete from existing]
iii) ~Store promoter info in Chado~ - moved to pombase/pombase-chado#1087
mock up; )
I put this at high priority, since it will be useful to finish up all of the allele QC work.
Impressive mockup
An extra option for ectopic expression
Which allele types is that ectopic allowed for?
@manulera All except deletion?
Which allele types is that ectopic allowed for?
All except deletion?
What about disruption?
What about disruption?
Probably the case will not arise, but I think it's OK to allow the user to select it. I don't think it will ever happen though.
An extra option for ectopic expression
I've made a separate issue for that: pombase/canto#2723
iii) Store promoter info in Chado
Moved to:
I've been working on this today. I've got a basic prototype working. Let's talk about it next time we're on a call.
It's now in the test Canto if you'd like to try it: https://curation.pombase.org/test/curs/4666975359de04dd/genotype_manage
@jseager7 @CuzickA
Are these two new fields of use to PHI-base? If not, I'll add a configuration item to hide them.
Also we added Ectopic expression a while ago and forgot to discuss it with you. Does it make sense for PHI-base?
Also we added Ectopic expression a while ago and forgot to discuss it with you. Does it make sense for PHI-base?
PHI-base is already using ectopic expression in curation sessions.
Also we added Ectopic expression a while ago and forgot to discuss it with you. Does it make sense for PHI-base?
PHI-base is already using ectopic expression in curation sessions.
I haven't seen the ectopic expression option in PHI-Canto.
@jseager7 @CuzickA
Are these two new fields of use to PHI-base? If not, I'll add a configuration item to hide them.
Also we added Ectopic expression a while ago and forgot to discuss it with you. Does it make sense for PHI-base?
I think that both of the proposed new fields would be useful in PHI-Canto. This type of information is often recorded in our comments section.
I haven't seen the ectopic expression option in PHI-Canto.
@CuzickA One of our curators said that they selected the ectopic expression level for PMID:12543662: https://github.com/PHI-base/curation/issues/135#issuecomment-1582852691
Was that on the demo server?
I haven't seen the ectopic expression option in PHI-Canto.
@CuzickA One of our curators said that they selected the ectopic expression level for PMID:12543662: PHI-base/curation#135 (comment)
Was that on the demo server?
Ok, thanks for spotting that. I haven't managed to work through some of those tickets yet.
I've just tried and it looks like ectopic expression is available on the main PHI-Canto server but not the demo server.
I tested, but I don't see where the information goes. I don't see it showing up alongside the genotype in genotype management. MAybe we can discuss tomorrow.
I'm hoping this can resolve duplicate allele descriptions like: SPCC24B10.14c xlf1.AA c23b5043b024b0a5 T180A,S192A nmt41-xlf1.AA c23b5043b024b0a5 SPCC24B10.14c xlf1.AA c23b5043b024b0a5 T180A,S192A nmt1-xlf1.AA c23b5043b024b0a5
but how will we deal with expression levels altered by different copy number: SPBC12D12.02c cdm1+(1 copy) b5089e77a1ba9a37 wild type cdm1+(3 copies) b5089e77a1ba9a37 SPBC1734.02c cdc27+(1 copy) b5089e77a1ba9a37 wild type cdc27+(3 copies) b5089e77a1ba9a37 SPBC336.04 cdc6+(1 copy) b5089e77a1ba9a37 wild type cdc6+(3 copies) b5089e77a1ba9a37
but how will we deal with expression levels altered by different copy number: this is a question for when we chat tomorrow...
I don't see the option for entering promoter information yet (in the main PHI-Canto).
I've just tried and it looks like ectopic expression is available on the main PHI-Canto server but not the demo server.
That's because I haven't updated the demo server in a while. I'll hide this comment thread now.
I don't see the option for entering promoter information yet (in the main PHI-Canto).
I haven't updated PHI-Canto with these fields yet because PomBase isn't done with testing them.
I tested, but I don't see where the information goes. I don't see it showing up alongside the genotype in genotype management. MAybe we can discuss tomorrow.
Yep, that's one of the things to discuss. Currently you can only see the information if you edit the allele.
I haven't updated PHI-Canto with these fields yet because PomBase isn't done with testing them.
Hi James. Yep, it's still on a branch. Unfortunately, it will need a schema update when deployed.
Hello, I had a look at this and I think the interface makes sense. However, it should not be possible to add a value for both "gene promoter" and "exogenous promoter". It should be one or the other.
For now, display in square brackets after the expression level.
Ideally, exogenous promoters should be auto-completed
However, it should not be possible to add a value for both "gene promoter" and "exogenous promoter".
What do you think of this interface instead? It's in the test Canto if you'd like to try it.
Hi @kimrutherford. It's perfect like that, I think. The only thing is I would remove the first option ("None"), so the default is that no option is ticked. The title could then be "Used promoter (optional)". Alternatively, we can also replace "None" by "Unspecified".
There are cases where overexpression or knockdown can be driven without changing the promoter, so you would not want to say "None":
Nice
An extra possibility could be multi-copy
+ number of copies (should be bigger than 1)
pinging myself for discussion when Kimback
Hi Val. Did you mention on a call that this was an issue that think needs doing soon?
The prototype is still available in the test Canto: https://curation.pombase.org/test/curs/4666975359de04dd/genotype_manage
we can also replace "None" by "Unspecified".
I've done that since was quick. I've also fixed a few bugs.
Thanks I'l check this out before we next chat.
I've started work on pombase/pombase-chado#1087 but I noticed that the promoter details were being written to the JSON export file incorrectly. I'm fixing that now.
Note to self, the changes for this issue are on this branch: https://github.com/pombase/canto/tree/issue-2544-allele-promoters
I noticed that the promoter details were being written to the JSON export file incorrectly. I'm fixing that now.
That's fixed now.
I've also fixed the Canto code for editing alleles to understand that alleles/genotypes with different promoters are distinct. So there's now no problem having multiple genotypes that are identical except for the promoter of an allele.
For now, alleles with promoters are shown like this in the genotype list:
abc1-a(F124D)[Overexpression]{promoter:SPBC15D4.03}
or
abc1-a(F124D)[Ectopic]{exogenous_promoter:ADH1}
~2. If overexpression is selected, the ability to specify the "fold" overexpression (I.e 2X, 10X, 100X) We probably need to discuss the details. We need this to a) be able to distinguish different expression levels (and hence phenotypes) for overexpression alleles (and knockdowns) b) to clean up backgrounds and comments c) To fix some allele names which do not follow standard naming conventions d) also ectopic expression~
example Check cut12-s11 https://www.pombase.org/genotype/cut12.s11-G71V-amino_acid_mutation-expression-not_assayed https://www.pombase.org/genotype/cut12.s11-G71V-amino_acid_mutation-expression-wild_type_product_level