Open hmusta opened 6 years ago
Hi @hmusta , in the current version of Squeakr, we have auto-resizing when running with a single thread. So, even if you underestimate the size there won't be a seg fault. Please try it and let me know if you still have any issues.
Thanks, Prashant
Hello,
i observed segfaults when using the value from lognumslots.sh as well, with the squeakr version from Oct 2019 (should be 5ad2ad6674c06a0fe7495d38bc467c2f854be72f).
This seems to happen frequently for me on very small test datasets.
Reproducing this should be quite simple:
Create a file (called 1.fastq) containing:
@1_1/1 TATGCACCAGAGTATGGAAGCATAAGCTCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCAGTCAACAAAGCCGAGTGGGCGCAACGA + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Then run ntcard followed by lognumslots.sh:
ntcard -k32 1.fastq -p ntcard.out lognumslots.sh ntcard.out_k32.hist
lognumslots returns 7, but the smallest value for which squeakr count doesn’t crash is 10.
squeakr count -n -e -k 32 -s 7 -o 1.squeakr 1.fastq
results in a seqfault, while
squeakr count -n -e -k 32 -s 10 -o 1.squeakr 1.fastq
works fine.
I've noticed that on a small number of read sets (e.g.
SRR522088
),lognumslots.sh
underestimates the number of slots needed in the CQF for squeakr-exactHere's my current workflow for gzipped fastq files
In the case of
SRR522088
, the script computed 26 as the required number of slots, resulting in a segfault. When I set it to 27, it runs smoothly.Since this script is only in the
master
branch, I was wondering if there's perhaps a version tuned for theexact
branch that I may not be finding in the repo.