Closed ASLeonard closed 1 year ago
Hi Alex,
As for the first issue, I will simply wait for @marcelm's answer. He is doing most of the development at the moment, particularly with the separate indexing.
As for the second issue, do you get the same error if you use the last release 0.7.1?
To track down the second issue, it would be great if we could get access to the reads and genome. Is it possible?
Best, Kristoffer
Hi, thanks for trying out strobealign! We should soon be close to a somewhat stable release; thanks for reporting issues, that really helps us.
With --use-index
, you should not provide the name of the .sti
file, but the name of the FASTA file. Strobealign will add the .sti
extension to get the name of the index file. I’ll make this clearer in the documentation.
I’ll have a look at your data, thanks for posting it.
The program has been running for me for the last 30 minutes without a crash, so I wonder whether there’s something different. I noticed that your log output shows "Estimated read length: 149 bp" while I have "Estimated read length: 150 bp". Have you modified the SRA reads somehow (adapter trimming or so)?
Ah yeah the reads probably went through some trimming step. I'll see if I can subset the fastq and see where it crashes to avoid sharing such a large file.
So I found a subset of 1 million reads that crash it, and I included the exact reference which I believe also has an extra contig compared to the public one. Hosted here.
From a quick glance, it looks like there are some very trimmed reads (17 bp in total!), and after I filter them out, the crash disappears. I guess this is related to the strobe size or something similar being larger than the read, hence the
what(): basic_string::substr: __pos (which is 18) > this->size() (which is 17).
This doesn't appear to be an issue in the paf mode (-x
), that appears to run fine on the full data even with tiny reads.
Thanks for the great test case! I was able to reproduce the problem now. It should be fixed by PR #187 (branch has-shared-substring
).
Yep, just pulled that PR and it works now on 2 datasets that failed before with that error, so this looks resolved. Thanks!
P.S. I was piping the results through multiple samtools commands to mark duplicates, but the whole process was 2.55x faster and used 20% less memory when swapping out bwa-mem2, so this aligner is seriously fast!
Thanks for reporting! Any feedback is welcomed :)
Hi, I've been trying to use strobealign from source using the current head, and have ran into a few issues. Maybe the solution is just wait for the stable release 😉
The first is trying to use the pre-generated index approach with
strobealign ARS.fa.sti R1.fq.gz R2.fq.gz --use-index
. I get this error, so presumably it just isn't picking up the fact its an index file and not a fasta file.The second one is if I run with just a normal fasta reference and paired reads, and then get a range error a couple minutes into alignment. This doesn't appear to happen when using the single end mode, so might be related #16?
Best, Alex