pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

Create permanent identifiers for alleles #986

Open kimrutherford opened 2 years ago

kimrutherford commented 2 years ago

See also:

kimrutherford commented 2 years ago

I've had a think and this might not be too tricky to implement once this issue is done:

For that issue we'll need to process the allele data into a more usable form for the website, to make the allele pages. Once that's done, it will be straightforward to write the current allele IDs, names etc. to a file. Then at the start of the next load we would read that allele file into Chado.

The idea would be that once an allele had an ID, it would always be passed along to next night's load. We would be initialising Chado with the previous alleles before we start loading from Canto and PHAF files.

There will still be a few changes to make in the loading code but mostly we'd just need to add a new loader for the existing allele data.


Note to self: the alleles will need to be loaded (just) after the contig files because the alleles reference genes.

ValWood commented 2 years ago

OK great. Don't hurry, I can say in progress if we mention it at all.

kimrutherford commented 2 years ago

Once that's done, it will be straightforward to write the current allele IDs, names etc. to a file.

Turns out that writing the allele details to a file was straightforward. The allele information was already collected in the correct format in memory during the Chado-to-website processing step. It was just a few lines of extra code to get useful file: https://curation.pombase.org/dumps/latest_build/misc/allele_summaries.json

kimrutherford commented 2 years ago

I've had a think and this might not be too tricky to implement once this issue is done:

  • pombase/website#1294

I've thought some more. It doesn't make sense to have allele pages until we have permanent identifiers so these two issues need to be completed in parallel.

kimrutherford commented 2 years ago

The idea would be that once an allele had an ID, it would always be passed along to next night's load. We would be initialising Chado with the previous alleles before we start loading from Canto and PHAF files.

I think this is mostly solved. I've written a new loading step that reads the allele IDs and details from JSON file from the previous load. Since the ID and details will be pre-loaded, the allele IDs should stay the same from load to load.

I'm going to wait until Friday night before I activate the new step in case something goes wrong. My plan then is to check the load and compare with Thursday night's to make sure it worked OK.

kimrutherford commented 2 years ago

I'm going to wait until Friday night before I activate the new step in case something goes wrong.

This is on hold until I fix: pombase/pombase-chado#992

kimrutherford commented 1 year ago

Note to self: don't forgot to change the JaponicusDB build script to match any pombe changes.

kimrutherford commented 1 year ago

(From: https://github.com/pombase/website/issues/1294#issuecomment-1218509827)

We need to think about what happens when an allele is renamed, or its description changes. We talked on Zoom about implementing a new tool (maybe within Canto) that will allow alleles to be renamed in all sessions at once. I've made an issue:

We also need to think about what happens when an allele is deleted but there is a plan for that: Chado has a "is_obsolete" field for each feature. If we set that to true for deleted alleles I think everything will work out.

kimrutherford commented 1 year ago

Now waiting on:

kimrutherford commented 7 months ago

I've done some test runs today and it's all in a better state than I remember. :-)

It will probably take quite a few nightly loads to get things right so I suggest that we disable Canto and public pombase.org updates on Thursday night. That will give me Friday to commit and check the code and config changes. And then I'll have the weekend to run lots of test loads until things are working.

Does that sounds OK?

kimrutherford commented 7 months ago

I think this is the highest priority: https://curation.pombase.org/dumps/builds/pombase-build-2024-02-19/logs/log.2024-02-19-00-56-47.chado_checks.duplicated_allele_names

Could you look at this too when you get a chance?: https://github.com/pombase/curation/issues/3561

I'll double check the other log files before Friday to see if there is anything else that needs fixing urgently.

ValWood commented 7 months ago

These should be fixed for tomorrow

https://curation.pombase.org/dumps/builds/pombase-build-2024-02-19/logs/log.2024-02-19-00-56-47.chado_checks.duplicated_allele_names

kimrutherford commented 7 months ago

These should be fixed for tomorrow

Thanks!

kimrutherford commented 7 months ago

Hi Val.

There is still a duplicate allele name:

Check for two or more alleles with the same name - CHECK FAILURE:  expected 0 but got 2
name    uniquename  description allele_type canto_session
  prp3-3    SPAC29E6.02:allele-5    unknown unknown e8547aef6b97c8ef
  prp3-3    SPAC3A12.11c:allele-3   unknown unknown 7d409d497eb075ca

Over the weekend I did a couple of full test loads on my desktop with the new Canto code and the new loading code. Everything seemed to work as planned.

kimrutherford commented 7 months ago

Checklist for putting the new allele system into production:

kimrutherford commented 7 months ago

Hi Val.

There are some Canto load errors that might be easier to fix before the allele systematic ID changes. Are these hard to fix?

https://curation.pombase.org/dumps/builds/pombase-build-2024-02-26/logs/log.2024-02-25-22-16-33.curation-tool-data-load-output

ValWood commented 7 months ago

On my to do list for this week.

ValWood commented 7 months ago

I think prp3-1 is an allele of cwf2 but need to double check this

Screenshot 2024-02-26 at 12 12 39
ValWood commented 7 months ago

fixed prp3-1

ValWood commented 7 months ago

I think I have eventually cleared this log...will check tomorrow.....

ValWood commented 7 months ago

another go, hopefully tomorrow...

ValWood commented 7 months ago

it worked, but I have new ones, will fix today!

kimrutherford commented 6 months ago

I've now merged this code into the master branch (locally). It was a bit painful because there were conflicts with the code changes from pombase/canto#2544.

I've also merged the changes into the test Canto so we can test things: https://curation.pombase.org/test/curs/4666975359de04dd/genotype_manage

Note to self: branch issue-2758-disable-edits-for-existing-alleles-merged

kimrutherford commented 6 months ago

I've now merged this code into the master branch (locally).

Sorry, that wasn't clear. I've merged the code for preventing allele changes in Canto when you select an existing allele: pombase/canto#2758

ValWood commented 6 months ago

We found that we could not edit any annotation for an existing allele in any annotation where the allele was used in a genetic interaction, even if that annotation was not used in the interaction.

e.g.

Screenshot 2024-03-18 at 11 36 50

Let me know if you want to discuss.

CC @PCarme

ValWood commented 6 months ago

For example in the same session I want to change this evidence code to microscopy but I can't. I can't do it by copy/edit delete either because delete is also greyed out

Screenshot 2024-03-18 at 11 37 58
kimrutherford commented 6 months ago

We found that we could not edit any annotation for an existing allele in any annotation

I don't think this is due to the allele identifier changes. It's because of: pombase/canto#2740

I've pasted your comment into that issue.