pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
19 stars 7 forks source link

Curating disruption alleles #2842

Closed ValWood closed 2 months ago

ValWood commented 3 months ago

We have a couple of disruptions with different names, Historically we have only allowed a single instance of disruption. MAybe we should change this to allow for different markers, deleted regions.

What do you think?

@PCarme @manulera

Oddly we only have 2 conflicts right now!

SPAC24H6.05 cdc25::ura4 890568af6939d0d6,abf32a065b08bcd5 disruption cdc25- 27524319ad84c25c,c599c5b1b32b1bc9,c641c32144a9fceb

SPAC8C9.14 prr1::ura4 9878a4d6e46a6189 disruption prr1::his7 95043240cd8d761e

ValWood commented 3 months ago

3 actually

SPCC330.05c ura4-D18 0bd070642c31ebd4,2697c460cee0ce6a,5c744b7dd350f431,641dfb8a512e9e23,64464dca7a13a142,86746c3b2bce0444,d616690bf410cea4,d976e4a7ff421b27,e09ae4029661cf1c disruption ura4::lacZ e09ae4029661cf1c

ValWood commented 3 months ago

4, that really is all of them

SPAC24H6.05 cdc25::ura4 890568af6939d0d6,abf32a065b08bcd5 disruption cdc25- 27524319ad84c25c,c599c5b1b32b1bc9,c641c32144a9fceb

ValWood commented 3 months ago

The decision here is to remove the requirement for disruption ~names~ descriptions to be unique.

I guess this will create different pages for the different disruption alleles, which is OK as they may behave slightly differently depending on where the disruption is?

We need to be mindful that we describe the same disruption in the same way. For example, ura4-D18 and ura4::lacZ
and cdc25- and cdc25::ura4 might be the same thing (I don't know?)

It might be worth checking if

kimrutherford commented 3 months ago

The decision here is to remove the requirement for disruption names to be unique

Do you mean disruption descriptions rather than names?

ValWood commented 3 months ago

yes! edited

PCarme commented 3 months ago

Well, looking at the paper which introduced the ura4-D18 allele, it looks like it is a deletion rather than a disruption (from DOI: 10.1007/BF00331307):

image
ValWood commented 3 months ago

Hi @kimrutherford could you automatically update all single and multi alleles ura4-D19 in these sessions:

0bd070642c31ebd4 2697c460cee0ce6a 5c744b7dd350f431 641dfb8a512e9e23 64464dca7a13a142 86746c3b2bce0444 d616690bf410cea4 d976e4a7ff421b27 e09ae4029661cf1c to be type "deletion"

(or if there are not so many multi alleles we can fix manually, let us know how many there are first) Also hope long does this type of query and replace take you? then we can decide if it is better for us to just suck it up

ValWood commented 3 months ago

Actually there are not so many https://www.pombase.org/gene_alleles/SPCC330.05c we can do this manually

PCarme commented 3 months ago

And only 2 papers have annotations to the cdc25- allele in PomBase, one of which (DOI: 10.1038/nature08074) definitely used the cdc25-22 allele, based on the list of strains presented in the Supplementary data. I already went in the corresponding session to replace the allele manually. The other, older paper (https://doi.org/10.1002/j.1460-2075.1986.tb04594.x), is unclear about the mutation used, only stating that they used ts alleles of various cdc genes. So it could also be cdc25-22, but it could also not be so...

ValWood commented 3 months ago

This was likey to be the only ~cdc25- allele available back in 1986! I will check if Jacky knows.

kimrutherford commented 3 months ago

The decision here is to remove the requirement for disruption descriptions to be unique.

I've changed that in time for the load. I'll check the log when the load finishes.

kimrutherford commented 3 months ago

The decision here is to remove the requirement for disruption descriptions to be unique.

I've changed that in time for the load. I'll check the log when the load finishes.

Seems OK: https://curation.pombase.org/dumps/builds/pombase-build-2024-07-03/logs/log.2024-07-02-21-02-12.chado_checks.duplicate_allele_descriptions

ValWood commented 3 months ago

The duplicate allele file should be empty tomorrow for the first time. I fixed the remaining and changed the ???? to unknown

ValWood commented 3 months ago

So the allele in the Yanagida publication (https://doi.org/10.1002/j.1460-2075.1986.tb04594.x) was from Peter Fantes PMID:7217015

Title | Isolation of cell size mutants of a fission yeast by a new selective method: characterization of mutants and implications for division control mechanisms. Authors | Fantes PA Publication date | 1981 05

but this publication has 3 cdc25 ts alleles, one of which is cdc25-22

cdc25-22 is the one which they went don't to use in double mutants in Fantes PA so it may be the one they shared, but it is difficult to know. I think we can ignore this one...

PCarme commented 2 months ago

Hi @kimrutherford could you automatically update all single and multi alleles ura4-D19 in these sessions:

0bd070642c31ebd4 2697c460cee0ce6a 5c744b7dd350f431 641dfb8a512e9e23 64464dca7a13a142 86746c3b2bce0444 d616690bf410cea4 d976e4a7ff421b27 e09ae4029661cf1c to be type "deletion"

(or if there are not so many multi alleles we can fix manually, let us know how many there are first) Also hope long does this type of query and replace take you? then we can decide if it is better for us to just suck it up

That's fixed now

kimrutherford commented 2 months ago

Thanks Pascal! Is there anything left for me to do on this issue or have you fixed it all? :-)

PCarme commented 2 months ago

No, I think we've fixed everything that needed to be in this issue. @ValWood Do you confirm ?

ValWood commented 2 months ago

I think so. If anything slips through it's easy to fix