Closed ValWood closed 5 months ago
However, even though trm1.1 transcript now has D201A I cannot add this allele to Canto
How did you describe the allele? Did you include the transcript ID in the description? I don't think I tested any multi-transcript genes.
does Kim need to re run the pipeline.
The allele_qc server gets re-deployed each night at the end of the load so it should be up to date.
Hi @ValWood, the allele name should be trm1.1-D201A
. @kimrutherford is Canto sending the name to the api? I don't remember.
Just tested this. the correct description for trm1.1 (D201A) doesn't work, but the old description D177A (which now refers to transcript trm.2) is accepted.
https://curation.pombase.org/pombe/curs/269782b0ebb602b9
trm1.1-D177A(aaD177A) is incorrect, but works (refers to now trm1.2) trm1.1-D201A(aaD201A) is correct, but doesn't work
the allele name should be trm1.1-D201A. @kimrutherford is Canto sending the name to the api? I don't remember.
Hi @manulera
I checked the Canto code and it is sending the allele_name.
Val, you were right about the sequence being out of date. I forgot that the allele QC pipeline only runs twice a week, currently Sunday and Wednesday. I'll check in the morning to see if things are working.
I checked the Canto code and it is sending the allele_name.
Hi @manulera
Sorry to bother you with questions.
I investigated the call to check_allele
and I think maybe the problem is that data/genome.pickle
doesn't have the latest changes from the contig files. Does that make sense?
As a test I changed get_data.sh
to use the latest contig files:
https://github.com/pombase/allele_qc/commit/4ec936e8911a8d2fbd6235c6fa07b52e24b9806e#diff-b71ccacdb5628394a3b31a561f07a9875acff80cf66098e1fb06e8c04bb3c697
I checked in the results as a branch: https://github.com/pombase/allele_qc/tree/use-latest-contigs
And now the API works as expected when I deploy on my desktop. Do you think we should make that change to get_data.sh
?
Yes! good catch
Great. I've made a PR.
I have another question. In the Action config run_pipeline.yml
the changes are committed with:
- name: Commit changes
uses: stefanzweifel/git-auto-commit-action@v4
with:
commit_message: updated to last revision
file_pattern: "*.tsv"
Does the file_pattern
exclude data/genome.pickle
from the commit? Does that need changing?
Yes, good point. This is not a problem for the run_pipeline action itself, since it runs get_data.sh which generates a new pickle, but it is a problem for the docker container, which relies on those files being updated. The API also uses data/coordinate_changes_dict.json
.
There are three options I think:
I think we discussed this at some point and you were saying that 2 or 3 would not be great because it would mean downloading everything into oliver1.
Another thing for either solution, is that the latest_build folder from the nightly load should be used here, so whatever option the docker container should be mounted last.
Thanks Manu. I decided to go for option 1 for now and we'll see if that works for us. We'll keep 2 and 3 as backup plans.
I've changed the file_pattern
to "*.tsv *.json *.contig *genome.pickle"
Alright!
Do we need to keep this open?
I think this is done.
I reverted trm1 to the longer version because even though the short version is the major isoform we now have functional information for both transcripts so I needed to create 2 isoforms.
However, even though trm1.1 transcript now has D201A I cannot add this allele to Canto
@manulera does Kim need to re run the pipeline. I guess we need to rerun everytime a gene structure is altered If
a) it has associated alleles or modifications, or b) is needed to annotate new ones
and , in the case of a) I will need to fix (but these should appear in one of the logs)?
Is that correct?