pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
18 stars 7 forks source link

Capturing expression in Canto (with implications for website allele /genotype) #2544

Closed ValWood closed 2 months ago

ValWood commented 2 years ago
  1. The ability to specify the ~plasmid~ promoter (mostly of these are stored in names, background and comments)

~2. If overexpression is selected, the ability to specify the "fold" overexpression (I.e 2X, 10X, 100X) We probably need to discuss the details. We need this to a) be able to distinguish different expression levels (and hence phenotypes) for overexpression alleles (and knockdowns) b) to clean up backgrounds and comments c) To fix some allele names which do not follow standard naming conventions d) also ectopic expression~

example Check cut12-s11 https://www.pombase.org/genotype/cut12.s11-G71V-amino_acid_mutation-expression-not_assayed https://www.pombase.org/genotype/cut12.s11-G71V-amino_acid_mutation-expression-wild_type_product_level

ValWood commented 2 years ago

@manulera tagging

manulera commented 2 years ago

Following up on the discussion with @ValWood today. Perhaps for the phenotype annotations it is not so important, but when querying for alleles in Intermine I think it would be good to add a category "promoter" to alleles (NULL if not changed) in addition to "Allele expression".

Overexpression or knockdown are often induced in certain conditions, such as removal/addition of thiamine for nmt1 promoters, but I think the allele description should represent the changes in the DNA / protein. Same is true for ectopic expression (e.g. for inducing expression of meiotic proteins during another phase, or vice versa).

(Maybe this belongs in another issue, but somewhat related to this)

ValWood commented 2 years ago

Maybe this belongs in another issue, but somewhat related to this) I think it is good to brainstorm it in this ticket. I can make a clearer ticket once we decide.

So it would be, for example

gene allele allele type alleles description allele expression promoter
abc1 abc1-delta deletion - NULL N/A
abc1 abc1+ wild-type - WT N/A
abc1 abc1+ wild-type - overexpression nmt-blah (4X)
abc1 abc1-H34F amino acid mutation overexpression nmt-blah (20X)

where the promoter field is added a new field associated with overexpression or knockdown (for all other allele types the option will not be required).

we could default existing over/under expression to "unknown", but most times I think the information is captured in the background or allele names

I would migrate these over to the new field, they are mainly the ones in these 2 log files:

https://curation.pombase.org/dumps/latest_build/logs/log.2022-05-10-21-26-54.chado_checks i) Running: PomBase::Check::AlleleNotStartingWithGeneName or ii) Running: PomBase::Check::GenotypeBackgrounds (these are the backgrounds which do not currently match a known mutant)

manulera commented 2 years ago

Hello,

Yes, mostly that, but I think the (4x) or (20x) should go in the allele expression column, not in the promoter, since the expression levels will depend on experimental conditions for a given promoter.

For example, It is not very common, but sometimes you see that people use nmt41-GFP-gene constructs just to tag proteins in their N-term. They then do the experiments in presence of thiamine (overexpression not induced). These nmt1 promoters are quite leaky, so even in the presence of thiamine they can produce equal or higher levels of protein than the wild type. nmt1 always that I have used it produces an overexpression even when repressed, nmt41 not so sure.

If used for a localisation experiment, people probably will not test whether the expression levels are higher than if GFP-gene was expressed from the normal promoter. I think this is mostly done to quickly tag a protein in the N-term when a C-term tag would be lethal or perturb the behaviour of the protein.

ValWood commented 2 years ago

@manulera says:

It was in my todo list to make an issue about this exactly. I think the problem is the ambiguous use of expression levels and alleles. We discussed this briefly before. I think allele description should be only concerned with describing the DNA sequence, and expression level should be a different field since it is conditional.

We could make promoter_changed a type of allele, and we could indicate which promoter has been used when it is known. Also, a change in promoter alone may not lead to a change in expression (it may need to be induced or repressed), but typically if a phenotype has been described it is safe to assume that the experiment has been conducted at conditions where the expression levels change.

You can see what I mean with the following query, you see that all alleles that come up in wild_type are from those that come from changes in expression levels (see that the only values are knockdown and overexpression):

ValWood commented 2 years ago

@manulera says And for deletion, I think it would be ok to say 'deletion' in the description.

ValWood commented 2 years ago

Let's discuss this after @manulera vacation.

There are 2 possible solutions. We could either extend the expression section to capture the plasmid. Manu's suggestion is more radical (in terms of changes and retrofitting).

Maybe we can think of a better way. Some comments:

manulera commented 2 years ago

So we have a call on this when I come back?

ValWood commented 2 years ago

Yep.

ValWood commented 2 years ago

We could still change the description field from wild_type to promoter_change.

I think this might be possible. However, I think we have also used when an alteration to the allee is made in a second copy, but the WT copy remains. They did not show up in the query because I think from memory, that although they are displayed as single gene genotypes, they are modelled behind the scenes as multi-gene. We would need to check this.

ValWood commented 1 year ago

we don't need https://github.com/pombase/canto/issues/2629 becasue of this proposal.

ValWood commented 1 year ago

Another nice example of how annotations are split Ths should be high priority in the New Year because lots of annotations get split up:

https://www.pombase.org/genotype/asp1-H397A-H397A-amino_acid_mutation-expression-not_assayed https://www.pombase.org/genotype/asp1-H397A-H397A-amino_acid_mutation-expression-wild_type_product_level

ValWood commented 1 year ago

Here are some examples of expression differences that need differentiating: https://curation.pombase.org/dumps/latest_build/logs/log.2023-04-04-21-42-39.chado_checks.duplicate_allele_descriptions

[for now we won't worry about the copy number type overexpression, we can get these into Chado somehow, they are edge cases)

ValWood commented 1 year ago

DECISION:

i) ~An extra option for ectopic expression~ - moved to pombase/canto#2723

ii) Add additonal dialogue If you select Overexpression, Knockdown or Ectopic Pombe gene promoter [text box] -> pick a gene Exogenous promoter [ free text and autocomplete from existing]

iii) ~Store promoter info in Chado~ - moved to pombase/pombase-chado#1087

mock up; )

Screenshot 2023-04-05 at 12 36 12
ValWood commented 1 year ago

I put this at high priority, since it will be useful to finish up all of the allele QC work.

manulera commented 1 year ago

Impressive mockup

kimrutherford commented 1 year ago

An extra option for ectopic expression

Which allele types is that ectopic allowed for?

ValWood commented 1 year ago

@manulera All except deletion?

kimrutherford commented 1 year ago

Which allele types is that ectopic allowed for?

All except deletion?

What about disruption?

manulera commented 1 year ago

What about disruption?

Probably the case will not arise, but I think it's OK to allow the user to select it. I don't think it will ever happen though.

kimrutherford commented 1 year ago

An extra option for ectopic expression

I've made a separate issue for that: pombase/canto#2723

kimrutherford commented 1 year ago

iii) Store promoter info in Chado

Moved to:

kimrutherford commented 11 months ago

I've been working on this today. I've got a basic prototype working. Let's talk about it next time we're on a call.

image

kimrutherford commented 11 months ago

It's now in the test Canto if you'd like to try it: https://curation.pombase.org/test/curs/4666975359de04dd/genotype_manage

kimrutherford commented 11 months ago

@jseager7 @CuzickA

Are these two new fields of use to PHI-base? If not, I'll add a configuration item to hide them.

Also we added Ectopic expression a while ago and forgot to discuss it with you. Does it make sense for PHI-base?

jseager7 commented 11 months ago

Also we added Ectopic expression a while ago and forgot to discuss it with you. Does it make sense for PHI-base?

PHI-base is already using ectopic expression in curation sessions.

CuzickA commented 11 months ago

Also we added Ectopic expression a while ago and forgot to discuss it with you. Does it make sense for PHI-base?

PHI-base is already using ectopic expression in curation sessions.

I haven't seen the ectopic expression option in PHI-Canto.

CuzickA commented 11 months ago

@jseager7 @CuzickA

Are these two new fields of use to PHI-base? If not, I'll add a configuration item to hide them.

Also we added Ectopic expression a while ago and forgot to discuss it with you. Does it make sense for PHI-base?

I think that both of the proposed new fields would be useful in PHI-Canto. This type of information is often recorded in our comments section.

jseager7 commented 11 months ago

I haven't seen the ectopic expression option in PHI-Canto.

@CuzickA One of our curators said that they selected the ectopic expression level for PMID:12543662: https://github.com/PHI-base/curation/issues/135#issuecomment-1582852691

Was that on the demo server?

CuzickA commented 11 months ago

I haven't seen the ectopic expression option in PHI-Canto.

@CuzickA One of our curators said that they selected the ectopic expression level for PMID:12543662: PHI-base/curation#135 (comment)

Was that on the demo server?

Ok, thanks for spotting that. I haven't managed to work through some of those tickets yet.

I've just tried and it looks like ectopic expression is available on the main PHI-Canto server but not the demo server.

ValWood commented 11 months ago

I tested, but I don't see where the information goes. I don't see it showing up alongside the genotype in genotype management. MAybe we can discuss tomorrow.

I'm hoping this can resolve duplicate allele descriptions like: SPCC24B10.14c xlf1.AA c23b5043b024b0a5 T180A,S192A nmt41-xlf1.AA c23b5043b024b0a5 SPCC24B10.14c xlf1.AA c23b5043b024b0a5 T180A,S192A nmt1-xlf1.AA c23b5043b024b0a5

but how will we deal with expression levels altered by different copy number: SPBC12D12.02c cdm1+(1 copy) b5089e77a1ba9a37 wild type cdm1+(3 copies) b5089e77a1ba9a37 SPBC1734.02c cdc27+(1 copy) b5089e77a1ba9a37 wild type cdc27+(3 copies) b5089e77a1ba9a37 SPBC336.04 cdc6+(1 copy) b5089e77a1ba9a37 wild type cdc6+(3 copies) b5089e77a1ba9a37

ValWood commented 11 months ago

but how will we deal with expression levels altered by different copy number: this is a question for when we chat tomorrow...

CuzickA commented 11 months ago

I don't see the option for entering promoter information yet (in the main PHI-Canto).

jseager7 commented 11 months ago

I've just tried and it looks like ectopic expression is available on the main PHI-Canto server but not the demo server.

That's because I haven't updated the demo server in a while. I'll hide this comment thread now.

jseager7 commented 11 months ago

I don't see the option for entering promoter information yet (in the main PHI-Canto).

I haven't updated PHI-Canto with these fields yet because PomBase isn't done with testing them.

kimrutherford commented 11 months ago

I tested, but I don't see where the information goes. I don't see it showing up alongside the genotype in genotype management. MAybe we can discuss tomorrow.

Yep, that's one of the things to discuss. Currently you can only see the information if you edit the allele.

kimrutherford commented 11 months ago

I haven't updated PHI-Canto with these fields yet because PomBase isn't done with testing them.

Hi James. Yep, it's still on a branch. Unfortunately, it will need a schema update when deployed.

manulera commented 11 months ago

Hello, I had a look at this and I think the interface makes sense. However, it should not be possible to add a value for both "gene promoter" and "exogenous promoter". It should be one or the other.

manulera commented 11 months ago

For now, display in square brackets after the expression level.

manulera commented 11 months ago

Ideally, exogenous promoters should be auto-completed

kimrutherford commented 11 months ago

However, it should not be possible to add a value for both "gene promoter" and "exogenous promoter".

What do you think of this interface instead? It's in the test Canto if you'd like to try it.

image


image


image

manulera commented 11 months ago

Hi @kimrutherford. It's perfect like that, I think. The only thing is I would remove the first option ("None"), so the default is that no option is ticked. The title could then be "Used promoter (optional)". Alternatively, we can also replace "None" by "Unspecified".

There are cases where overexpression or knockdown can be driven without changing the promoter, so you would not want to say "None":

ValWood commented 11 months ago

Nice

manulera commented 9 months ago

An extra possibility could be multi-copy + number of copies (should be bigger than 1)

ValWood commented 8 months ago

pinging myself for discussion when Kimback

kimrutherford commented 4 months ago

Hi Val. Did you mention on a call that this was an issue that think needs doing soon?

The prototype is still available in the test Canto: https://curation.pombase.org/test/curs/4666975359de04dd/genotype_manage

kimrutherford commented 4 months ago

we can also replace "None" by "Unspecified".

I've done that since was quick. I've also fixed a few bugs.

ValWood commented 4 months ago

Thanks I'l check this out before we next chat.

kimrutherford commented 4 months ago

I've started work on pombase/pombase-chado#1087 but I noticed that the promoter details were being written to the JSON export file incorrectly. I'm fixing that now.

Note to self, the changes for this issue are on this branch: https://github.com/pombase/canto/tree/issue-2544-allele-promoters

kimrutherford commented 4 months ago

I noticed that the promoter details were being written to the JSON export file incorrectly. I'm fixing that now.

That's fixed now.

I've also fixed the Canto code for editing alleles to understand that alleles/genotypes with different promoters are distinct. So there's now no problem having multiple genotypes that are identical except for the promoter of an allele.

For now, alleles with promoters are shown like this in the genotype list:

abc1-a(F124D)[Overexpression]{promoter:SPBC15D4.03}

or

abc1-a(F124D)[Ectopic]{exogenous_promoter:ADH1}