pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
18 stars 7 forks source link

Rpb1 CTD alleles #2805

Closed ValWood closed 2 months ago

ValWood commented 2 months ago

I want to change alleles of type ‘other’ in a new session to Described: CTD Thr4 mutated to Ala from r5-r18 (r for repeats), ∆ r19-r29

To: Partial deletion and amino acid mutation:

CTD-T4A(r5-r18),1683-1752

But this syntax is,nt allowed

CTD-T4A(r5-r18), works for single amino acid substitution so I guess it is just the combination that doesn’t work?

ValWood commented 2 months ago

I will also need to convert these types

rpb1-Y1F(CTD-18repeats)(CTD Tyr1 mutated to Phe from r5-r18 (r for repeats), ∆ r19-r29)[Not assayed] which will combine 2 different CTD mutations with deletions will look something like

CTD-Y1F(r1-r18), CTD-Y1F(r1-r18),1683-1752

ValWood commented 2 months ago

these are the allowed syntax for CTD allele descriptions

_CTD_RULES ASK KIM to get old

I'm not sure if this is implemented in Canto?

kimrutherford commented 2 months ago

Partial deletion and amino acid mutation: CTD-T4A(r5-r18),1683-1752

Hi Val.

It works for me in the other order: 1683-1752,CTD-T4A(r5-r18)

kimrutherford commented 2 months ago

these are the allowed syntax for CTD allele descriptions

Are those allele descriptions or allele names?

kimrutherford commented 2 months ago

Partial deletion and amino acid mutation: CTD-T4A(r5-r18),1683-1752

@manulera

For rpb1 / SPBC28F2.12 this description fails the allele_qc checks "CTD-T4A(r5-r18),1683-1752" but it works if we put the parts in the other order: "1683-1752,CTD-T4A(r5-r18)". The message is:

The following parts of the allele description do not follow the existing syntax: CTD-,(r5-r18),

I've tried to fix the problem myself but I couldn't work it out. Can you help?

ValWood commented 2 months ago

Are those allele descriptions or allele names?

These are the recommendations for standardised names but the description would be the same (everything after the hyphen). It is the standardized description that is critical to us (the names can vary especially for legacy ones).

Currently many of the legacy names are not in this form, but some of the more straightforward ones are.

I will probably need to tweak some of the existing names where they have potential to cause future conflicts (for example where people mutated alternate residues but this isn't clear from the name ( I made a mess of these yesterday and I nee dto review them).

I will also add allele comments , especially to the more obscure ones like alternate repeats.

kimrutherford commented 2 months ago

I'm not sure if this is implemented in Canto?

The allele_qc code covers those cases.

So things like CTD-T4A(r5-r18), CTD-T4A and CTD-delta(r5-r18) are all OK.

ValWood commented 2 months ago

I have a ticket for this https://github.com/pombase/curation/issues/3671

manulera commented 2 months ago

Hi @kimrutherford sorry for the late response. The order matters in CTD alleles, because you only write CTD once, so for instance if you have some repeats deleted and a mutation on all remaining repeats, you write:

CTD-T4A(r5-r18),Y1F

The Y1F belongs to the CTD as well. If you want to refer to a mutation in the "body" of the protein, then it should be before the CTD, e.g. K90A,CTD-T4a(r5-r18). The same is true for the above example 1683-1752,CTD-T4A(r5-r18)

kimrutherford commented 2 months ago

Ah! Thanks for the explanation.

manulera commented 2 months ago

Also partial deletions are not implemented for CTD syntax because there were no occurrences so far, so something like CTD-T4A,1-2 will not pass

ValWood commented 2 months ago

I have quite a few things to fix which do not match the consensus: https://github.com/pombase/curation/issues/3671