pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
18 stars 7 forks source link

distinguish "disruption" type allele #681

Closed pombase-admin closed 8 years ago

pombase-admin commented 10 years ago

At present we annotate disruptions as other, and partial deletions as deletions (I think?)

We would like another classifier for these "disruption" syntax yfg1::ymg1+ (where ymg = your marker gene)

Maybe we should also have a "partial deletion" allele type for when we know that the deletion is not complete (but we do not know the absolute range of bases, but they are loosely called deletion as say 80% of the deletion is delted)....I think there are quite a few like this, and longer term it will be useful to have a way to distinguish them from a complete deletant

Original comment by: ValWood

pombase-admin commented 10 years ago

There are partial deletion allele types for both amino acid and nucleotide.

Disruption should still be added, though, because it's possible to just plunk some interrupting sequence into your gene of interest without deleting anything.

Original comment by: mah11

pombase-admin commented 10 years ago

this sounds like a time-sink! Can we have a parent term for when we don't know or the information would require a literature search for whoever characterized the strain first?

Original comment by: Antonialock

pombase-admin commented 10 years ago

I agree - it's often hard if not impossible to tell from one paper whether it's reporting on a true deletion, a "near enough that we'd still call it a deletion" deletion, or just something they're calling a deletion ...

Original comment by: mah11

pombase-admin commented 10 years ago

OK, If they say its a deletion we should capture it as a deletion.

We probably will only become aware of these odd cases at later times, when we have 2 deletion alleles with different phenotypes (similar to the cdc8 case).

If it becomes clear for a particular "deletion" that is is only partial, we can keep it as a deletion , but change the description to "cdc2Δ-partial" (or even tage the allele name with the strain to disambiguate?)

This will help to make it clearer why the phenotypes are different but we don't need to change anything.

Original comment by: ValWood

pombase-admin commented 10 years ago

to be honest, if I was god I think I'd just call it a deletion (if the author does) until there is evidence to the contrary...

for instance that deletion allele a while back where full deletion = viable 60% deletion = inviable (this was deletion collection paper)

I think we ended up deleting the row from the deletion paper, but we could have changed the description to "other - 60% deletion"

Detailed descriptions for all these alleles where they are often sub-optimally described in the papers just seems like a difficult thing to do, and I don't know how much value it will add? (on the other hand maybe it won't be so difficult and will add lots of value?) (or maybe we could have the option of adding it if the disambiguation is easily accessible)

Original comment by: Antonialock

pombase-admin commented 10 years ago

or allow additional syntax for the "partial deletion" eg not just residue ranges but also percentages?

(again, we could probably work out the residue ranges from the primers but that would make me cry)

Original comment by: Antonialock

pombase-admin commented 10 years ago

I find myself basically agreeing with everything Antonia says ...

I'd like to have the "disruption" option available for the cases where the paper makes it clear that's what they've got. But we don't always have enough details, so I also like Antonia's suggestion to take authors' word about deletions until and unless conflicts arise.

Original comment by: mah11

pombase-admin commented 10 years ago

Ok I think we are all saying the same thing. We only need the disambiguation if there is a difference.

Original comment by: ValWood

pombase-admin commented 10 years ago

the summary of this ticket is "Disruption should still be added, though, because it's possible to just plunk some interrupting sequence into your gene of interest without deleting anything."

We would like "disruption" as an allele type

the description will be yfg1::ymg1+ format

Original comment by: ValWood

ValWood commented 8 years ago

I'm going to make a new ticket for this later...moving to curation tracker for now

kimrutherford commented 8 years ago

Can this be closed? Did you make a curation tracker issue?

kimrutherford commented 8 years ago

These allele name are causing problems for Mark. Can we fix them for V55?

mah11 commented 8 years ago

I'm not sure why Val suggested moving this to the curation tracker -- there's a Canto request at the heart of it, and not a huge one at that: add "disruption" to the allele type options.

What sort of allele names are causing problems for Mark? If it's the "yfg1::ura4+" thing, he's gotta deal with it because that's the decades-old standard nomenclature for gene disruptions in yeasts. If he means something else, we'll need a list of which names need fixing (and can open a curation ticket for them).

kimrutherford commented 8 years ago

Mark says:

The issue is with them ending in ":". Colons and semi colons an be in the middle of the names, but at the end causes an issue and so gets flagged up by the healthchecks in ensembl and ensembl genomes

The two that Mark noticed were:

SPBC1A4.03c:allele-9 Sma:: SPBC1A4.03c:allele-8 Xba::

both from session 6f1818d4183cd160

mah11 commented 8 years ago

SPBC1A4.03c:allele-9 Sma:: SPBC1A4.03c:allele-8 Xba::

Oh, those. They're special. They've annoyed us before; I remember looking at the paper last time they made something choke. The authors really did use those names ... and they're not even disruptions!!! They're just truncations (aka partial deletions), and therefore unrelated to the issue of adding "disruption" as an allele type in Chado, Canto, etc.

I am now making the executive decision to delete the sodding colons, so the names will load for v55. So there.

kimrutherford commented 8 years ago

I am now making the executive decision to delete the sodding colons, so the names will load for v55. So there.

Excellent!

kimrutherford commented 8 years ago

there's a Canto request at the heart of it, and not a huge one at that: add "disruption" to the allele type options.

I'll need to do that before closing this issue.

Is a disruption allele much like a deletion? Should disruption alleles get an automatic name like deletions do? If so, what should it be?

Is a description needed or required on those alleles?

I assume (as with deletions) it doesn't make sense to allow an expression?

mah11 commented 8 years ago

Is a disruption allele much like a deletion?

In a disruption, another gene -- almost always a selectable marker -- is plunked into the gene of interest (hereafter yfg1). Sometimes some of the yfg1 sequence is deleted, but not all (otherwise we'd call it a deletion). Usually stuffing in the extraneous sequence results in no Yfg1 product being made, so disruptions most often give null phenotypes. But of course there are occasionally exceptions, 'cos this is biology and it's complicated.

Should disruption alleles get an automatic name like deletions do?

It might be easier not to. If we were to auto-generate names, I think the interface would have to collect the identity of the disrupting gene (i.e. the selectable marker that gets stuffed in, e.g. ura4+). Then the name would be in the "yfg1::ura4+" format.

Is a description needed or required on those alleles?

It will certainly be a good thing to have. I think the "yfg1" part and the disrupting marker (e.g. "ura4+") will always be known, so we can demand them. The insertion site and whether any yfg1 sequence is deleted should be known, too, but may be a pain to determine, so probably should be optional.

I assume (as with deletions) it doesn't make sense to allow an expression?

Hmm. Expression is almost always null for disruptions, but yes, you guessed it, there are rare exceptions. I recall curating one a while back (tho of course I can't remember which paper :P ). So I guess we'd better leave the ability to support the exceptions ...

ValWood commented 8 years ago

I only moved to the curation tracker as I was hoping we could standardize other 'others' and include in the drop down at the same time (before I went on hol, but did not get around to it).

Here is the ticket: https://github.com/pombase/curation/issues/657 its no problem if the disruption one is done earlier, we can address the others later.

ValWood commented 8 years ago

note that there will still be some work to do on the curation side for disruptions, fixing the "non-standard" ones SPBC530.14c:allele-2 | dsk1+::ura4 SPBC4F6.06:allele-3 | kinl::LEU2

Questions... should markers should always be lower case? should markers always have "+" appended? (otherwise we will have the same allele listed in 2 ways) should the wt gene allele omit the "+"

(I think the convention is to always have lower case in alleles and only have + on the marker, but there are anomalies in the file in the dropbox which need fixing)

mah11 commented 8 years ago

should markers should always be lower case?

Not necessarily. Not all markers are pombe genes. In the example above, LEU2 is an S. cerevisiae gene, for which upper case is correct (I suspect the disruption description should be kin1::LEU2, but the "LEU2" part is fine). Other markers, such as KanMX, also get used, although more often where the target gene is also deleted, so we would call those deletions.

should markers always have "+" appended?

Again, not necessarily, because it's so often omitted from S.c. gene names, and because other markers would never have the "+". If the marker is a pombe gene, it should have the "+", but we may have to maintain that manually.

should the wt gene allele omit the "+"

If it's a pombe gene, no (so the first example should be dsk1::ura4+), but these days it's seldom included for S.c. wt alleles.

ValWood commented 8 years ago

should the wt gene allele omit the "+"

In the majority of cases this is not included in the allele description for a disruption. I think we could standardize on exclusion (as that is how the bulk have been done so far).

mah11 commented 8 years ago

If by "the wt gene allele" you mean the selectable marker that's inserted, it should have the "+" if it's a pombe gene.

mah11 commented 8 years ago

The gene being disrupted shouldn't get the "+" because disrupting it makes it not wild type.

</bleedin' obvious>
kimrutherford commented 8 years ago

I've added a disruption allele type. Currently the allele name is required, the description and expression are optional. Let me know if I need to tweak that.

ValWood commented 8 years ago

</bleedin' obvious>

did you fix the anomalies? I'm sure there were some, but I could not see any in the current file? Maybe there was only one...

ValWood commented 8 years ago

I think the description can be optional but we should have some text to appear under the allele description box suggesting a recommended format e.g.

cdc2+::marker, cdc2+::marker(nt343)

mah11 commented 8 years ago

Comments on the description content and format should go on the curation ticket (https://github.com/pombase/curation/issues/664). The disrupted gene should NOT have the "+" because it is not wild type if it is disrupted.

kimrutherford commented 8 years ago

did you fix the anomalies? I

I wasn't me. There are none now so it should be fine in the next release.

@markmcdowall

ValWood commented 8 years ago

I think we are done on this ticket. If there is anything else we should put it in a new ticket. There will be new tickets spawned from the curation tracker ticket eventually.