pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
18 stars 7 forks source link

Make Canto alleles match what's in Chado #2776

Closed kimrutherford closed 7 months ago

kimrutherford commented 8 months ago

While working on pombase/canto#2770 I noticed a bunch of alleles names in Canto sessions that are inconsistent with what's in Chado. Mostly these aren't a surprise.

The most common thing is allele where the type or description is "unknown" in Canto but the allele is merged with a gene with the same name in Chado. For example this allele in Canto is merged and ends up as amino_acid_mutation / G345R in Chado:

09b5fcafa2826d4a-1  myo2-E1 

In this sort of case is there any problem with setting the allele type and description in the Canto sessions where it's unknown?

There are also some nonsense mutations that need fixing, for example:

155dba22ce11dc4f:  SPAC17G6.05c nonsense mutation E644->stop 

I think now would be a good time to tidy up as many of these alleles as possible, before assigning unique identifiers.

kimrutherford commented 8 months ago

I started writing a script to match up Chado alleles and Canto alleles by comparing allele names, types and descriptions but then I thought of a more reliable plan.

There are internal IDs for alleles in Canto that uniquely identify alleles within sessions. We can store those internal IDs in Chado which will make it easy to match up Chado and Canto IDs later. It will also make it easier to assign stable IDs to the alleles in Canto.

kimrutherford commented 8 months ago

We can store those internal IDs in Chado which will make it easy to match up Chado and Canto IDs later.

That's implemented now and committed. I'm running a local load which seems fine so far and I'll check the main load tomorrow.

ValWood commented 8 months ago

In this sort of case is there any problem with setting the allele type and description in the Canto sessions where it's unknown?

We discussed this and the consensus was 'no problem' because the descriptions are already addon on Chado loading. It makes sense therefore to correct them in Canto.

kimrutherford commented 8 months ago

It will help to do this issue first:

kimrutherford commented 8 months ago

After careful testing, I now a have a script ready to go that will set allele details Canto that are missing using the details from Chado. I hope to apply this fix over the weekend after applying: pombase/canto#2642

ValWood commented 8 months ago

fingers x'd.

kimrutherford commented 7 months ago

After careful testing, I now a have a script ready to go that will set allele details Canto that are missing using the details from Chado.

I've applied that script now. We'll probably need to run it again just before the final switch over to stable allele IDs to catch any alleles that have been added between now and then. See: pombase/canto#2770

ValWood commented 7 months ago

Great! Hopefully no new probelm alleles will be added.