Closed ZarulHanifah closed 9 months ago
I also got this error with a run on Bakta v1.9.1.
predict CRISPR arrays...
len(reps)=5, int(copies)=6
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/py310/bin/bakta", line 10, in
Hi @ZarulHanifah / @marade , thanks for reporting. Could you provide me with a genome sequence to reproduce & potentially debug this error? I'd like to take a deeper look into this.
Thank you @oschwengers . Here you go. GCA_025196405.1_ASM2519640v1.fasta.txt
Thanks for your work on this project! Just commenting to say that I am experiencing this issue as well with a similar backtrace. Happy to provide more more example genomes if that would be helpful.
Edit: rerunning bakta with the --skip-crispr
flag circumvents this issue.
@ZarulHanifah & @marade , I've merged a PR #267 fixing this. I wrongly supposed that there is always an even number of spacers & repeats in each CRISPR array. I fixed this and improved the PILER-CR parser. You can use this already from https://github.com/oschwengers/bakta/tree/main or wait until I've released a patch v1.9.2 - maybe somewhen this week.
Thank you @oschwengers ... unfortunately, another AssertionError from PILER-CR:
Bakta v1.9.2
Options and arguments:
input: /fs03/jm41/Zarul/C002_B2_results/derep/dereplicated_genomes/metabat.641.fasta
db: /fs03/ie79/db/bakta_db, version 5.0, full
output: /fs03/jm41/Zarul/C002_B2_results/bakta/metabat.641
force: True
tmp directory: /tmp/tmpbg0fpfp9
prefix: metabat.641
threads: 2
debug: True
translation table: 11
locus tag prefix: METABAT.641
Bakta runs in DEBUG mode! Temporary data will not be destroyed at: /tmp/tmpbg0fpfp9
parse genome sequences...
imported: 388
filtered & revised: 388
contigs: 388
start annotation...
predict tRNAs...
found: 112
predict tmRNAs...
found: 1
predict rRNAs...
found: 0
predict ncRNAs...
found: 2
predict ncRNA regions...
found: 13
predict CRISPR arrays...
Traceback (most recent call last):
File "/fs03/ie79/Zarul/status_nanopore/C002_B2/.snakemake/conda/22185ec851ca2597fabecb499d58e23d_/bin/bakta", line 10, in <module>
sys.exit(main())
File "/fs03/ie79/Zarul/status_nanopore/C002_B2/.snakemake/conda/22185ec851ca2597fabecb499d58e23d_/lib/python3.10/site-packages/bakta/main.py", line 210, in main
genome['features'][bc.FEATURE_CRISPR] = crispr.predict_crispr(genome, contigs_path)
File "/fs03/ie79/Zarul/status_nanopore/C002_B2/.snakemake/conda/22185ec851ca2597fabecb499d58e23d_/lib/python3.10/site-packages/bakta/features/crispr.py", line 105, in predict_crispr
assert spacer_seq == spacer_genome_seq # assure PILER-CR provided sequence equals sequence extracted from genome
AssertionError
hmm... ok could you provide the metabat.641.fasta
input file to debug this?
Right, here you go! metabat.641.fasta.txt
Thank you!
I am running bakta on a bunch of genomes, many worked wonderfully, but a few actually failed, due to something CRISPR-related. One of the genome is
GCA_025196405.1
. Here is the error message.The commands (part of a snakemake workflow):
The log:
I installed bakta through conda.
Thank you.