Closed kimrutherford closed 7 months ago
I started writing a script to match up Chado alleles and Canto alleles by comparing allele names, types and descriptions but then I thought of a more reliable plan.
There are internal IDs for alleles in Canto that uniquely identify alleles within sessions. We can store those internal IDs in Chado which will make it easy to match up Chado and Canto IDs later. It will also make it easier to assign stable IDs to the alleles in Canto.
We can store those internal IDs in Chado which will make it easy to match up Chado and Canto IDs later.
That's implemented now and committed. I'm running a local load which seems fine so far and I'll check the main load tomorrow.
In this sort of case is there any problem with setting the allele type and description in the Canto sessions where it's unknown?
We discussed this and the consensus was 'no problem' because the descriptions are already addon on Chado loading. It makes sense therefore to correct them in Canto.
It will help to do this issue first:
After careful testing, I now a have a script ready to go that will set allele details Canto that are missing using the details from Chado. I hope to apply this fix over the weekend after applying: pombase/canto#2642
fingers x'd.
After careful testing, I now a have a script ready to go that will set allele details Canto that are missing using the details from Chado.
I've applied that script now. We'll probably need to run it again just before the final switch over to stable allele IDs to catch any alleles that have been added between now and then. See: pombase/canto#2770
Great! Hopefully no new probelm alleles will be added.
While working on pombase/canto#2770 I noticed a bunch of alleles names in Canto sessions that are inconsistent with what's in Chado. Mostly these aren't a surprise.
The most common thing is allele where the type or description is "unknown" in Canto but the allele is merged with a gene with the same name in Chado. For example this allele in Canto is merged and ends up as
amino_acid_mutation
/ G345R in Chado:In this sort of case is there any problem with setting the allele type and description in the Canto sessions where it's unknown?
There are also some nonsense mutations that need fixing, for example:
I think now would be a good time to tidy up as many of these alleles as possible, before assigning unique identifiers.