pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
18 stars 7 forks source link

sequence change has not filtered through to Canto #2790

Closed ValWood closed 5 months ago

ValWood commented 6 months ago

I reverted trm1 to the longer version because even though the short version is the major isoform we now have functional information for both transcripts so I needed to create 2 isoforms.

However, even though trm1.1 transcript now has D201A I cannot add this allele to Canto

Screenshot 2023-10-23 at 18 47 05

@manulera does Kim need to re run the pipeline. I guess we need to rerun everytime a gene structure is altered If

a) it has associated alleles or modifications, or b) is needed to annotate new ones

and , in the case of a) I will need to fix (but these should appear in one of the logs)?

Is that correct?

kimrutherford commented 6 months ago

However, even though trm1.1 transcript now has D201A I cannot add this allele to Canto

How did you describe the allele? Did you include the transcript ID in the description? I don't think I tested any multi-transcript genes.

does Kim need to re run the pipeline.

The allele_qc server gets re-deployed each night at the end of the load so it should be up to date.

manulera commented 6 months ago

Hi @ValWood, the allele name should be trm1.1-D201A. @kimrutherford is Canto sending the name to the api? I don't remember.

ValWood commented 6 months ago

Just tested this. the correct description for trm1.1 (D201A) doesn't work, but the old description D177A (which now refers to transcript trm.2) is accepted.

https://curation.pombase.org/pombe/curs/269782b0ebb602b9

trm1.1-D177A(aaD177A) is incorrect, but works (refers to now trm1.2) trm1.1-D201A(aaD201A) is correct, but doesn't work

ValWood commented 6 months ago
Screenshot 2023-10-24 at 11 34 15
kimrutherford commented 6 months ago

the allele name should be trm1.1-D201A. @kimrutherford is Canto sending the name to the api? I don't remember.

Hi @manulera

I checked the Canto code and it is sending the allele_name.

Val, you were right about the sequence being out of date. I forgot that the allele QC pipeline only runs twice a week, currently Sunday and Wednesday. I'll check in the morning to see if things are working.

kimrutherford commented 6 months ago

I checked the Canto code and it is sending the allele_name.

Hi @manulera

Sorry to bother you with questions.

I investigated the call to check_allele and I think maybe the problem is that data/genome.pickle doesn't have the latest changes from the contig files. Does that make sense?

As a test I changed get_data.sh to use the latest contig files: https://github.com/pombase/allele_qc/commit/4ec936e8911a8d2fbd6235c6fa07b52e24b9806e#diff-b71ccacdb5628394a3b31a561f07a9875acff80cf66098e1fb06e8c04bb3c697

I checked in the results as a branch: https://github.com/pombase/allele_qc/tree/use-latest-contigs

And now the API works as expected when I deploy on my desktop. Do you think we should make that change to get_data.sh?

manulera commented 6 months ago

Yes! good catch

kimrutherford commented 6 months ago

Great. I've made a PR.

I have another question. In the Action config run_pipeline.yml the changes are committed with:

       - name: Commit changes
         uses: stefanzweifel/git-auto-commit-action@v4
         with:
           commit_message: updated to last revision
           file_pattern: "*.tsv"

Does the file_pattern exclude data/genome.pickle from the commit? Does that need changing?

manulera commented 6 months ago

Yes, good point. This is not a problem for the run_pipeline action itself, since it runs get_data.sh which generates a new pickle, but it is a problem for the docker container, which relies on those files being updated. The API also uses data/coordinate_changes_dict.json.

There are three options I think:

  1. If we are sure that run_pipeline is ran always before the docker container is built, we could add the right extensions to the file_pattern (.contig, .tsv, .pickle, .json, not sure something else).
  2. We could uncomment the lines in docker_start.sh, so that it gets the fresh data anyway every time it's mounted.
  3. Step 2 + removing all those genome files from source control.

I think we discussed this at some point and you were saying that 2 or 3 would not be great because it would mean downloading everything into oliver1.

Another thing for either solution, is that the latest_build folder from the nightly load should be used here, so whatever option the docker container should be mounted last.

kimrutherford commented 6 months ago

Thanks Manu. I decided to go for option 1 for now and we'll see if that works for us. We'll keep 2 and 3 as backup plans.

I've changed the file_pattern to "*.tsv *.json *.contig *genome.pickle"

manulera commented 6 months ago

Alright!

ValWood commented 5 months ago

Do we need to keep this open?

kimrutherford commented 5 months ago

I think this is done.