When working with many reads (e.g., 1 Mio PacBio reads), I noticed that the extraction of split reads (step 5) becomes quite slow.
I attached a modified version that uses seqtk subseq instead of samtools faidx, which produces the same result, just a lot faster. You may also consider replacing seqkit with seqtk for the extraction of non-chimeric reads, which would remove one dependency (only seqtk instead of seqtk + samtools).
Dear SACRA developers,
When working with many reads (e.g., 1 Mio PacBio reads), I noticed that the extraction of split reads (step 5) becomes quite slow.
I attached a modified version that uses seqtk subseq instead of samtools faidx, which produces the same result, just a lot faster. You may also consider replacing seqkit with seqtk for the extraction of non-chimeric reads, which would remove one dependency (only seqtk instead of seqtk + samtools).
Feel free to use the code as you see fit.
Best wishes, Shini
SACRA.dev.sh.zip