walaj / VariantBam

Filtering and profiling of next-generational sequencing data using region-specific rules
Other
74 stars 10 forks source link

-b option truncates output at 10k #14

Closed mikheyev closed 6 years ago

mikheyev commented 6 years ago

I am using the latest release and variant infile.bam -m 200 -b -o outfile.bam works up to about 10k on the reference, and then filters out all the reads. dropoff

walaj commented 6 years ago

Are you using the latest release or the latest commit? I am not able to recreate this issue with the latest commit and perhaps its time I issue a new release.

mikheyev commented 6 years ago

Yes, just checked it, and am using the latest commit. I am attaching a couple of test files.

variant test.bam -m 100 -o test.sample.bam
samtools view test.sample.bam| cut -f4 |tail
9987
9990
9994
9994
9998
9999
9999
10000
10000
10004

samtools view test.bam| cut -f4 |tail
16557
16557
16559
16561
16561
16564
16565
16565
16566
16569
walaj commented 6 years ago

The problem was that the last buffer holding the reads before writing was not getting written, so the last 0-10000 reads of subsampled BAMs were not getting written. The patch fixes this:

variant test.bam -m 100 | cut -f4 | tail
16557
16557
16559
16561
16561
16564
16565
16565
16566
16569

Thanks for the bug report

mikheyev commented 6 years ago

I checked it an it works. Thanks a lot for the quick fix.