hattori-lab / SACRA

SACRA (Split Amplified Chimeric Read Algorithm) is the algorithm for correcting chimeric long-reads generated by MDA.
MIT License
3 stars 1 forks source link

Split read extraction slow with samtools faidx #7

Closed SuShiAtGit closed 3 years ago

SuShiAtGit commented 3 years ago

Dear SACRA developers,

When working with many reads (e.g., 1 Mio PacBio reads), I noticed that the extraction of split reads (step 5) becomes quite slow.

I attached a modified version that uses seqtk subseq instead of samtools faidx, which produces the same result, just a lot faster. You may also consider replacing seqkit with seqtk for the extraction of non-chimeric reads, which would remove one dependency (only seqtk instead of seqtk + samtools).

Feel free to use the code as you see fit.

Best wishes, Shini

SACRA.dev.sh.zip

YuyaKiguchi commented 3 years ago

Hi Shini,

Thank you for your comment. I've updated SACRA.sh based on your comment!

Yuya