walaj / VariantBam

Filtering and profiling of next-generational sequencing data using region-specific rules
Other
74 stars 10 forks source link

Possible problem with Example use 7: cannot filter out reads #15

Closed naumenko-sa closed 5 years ago

naumenko-sa commented 5 years ago

Hi Jeremiah @walaj !

Thanks for the great tool!

After aligning reads to the reference with decoy I'm trying to remove the reads mapped to decoy with VariantBam (Example Use 7).

I do have some reads mapped to decoy:

samtools idxstats 159_CH0315-ready.bam
1   249250621   66524406    8743
2   243199373   70648459    11086
...
NC_007605   171823  45  0
hs37d5  35477943    40140873    12974

My VariantBam command:

variant 159_CH0315-ready.bam -L decoy.bed -o 159_CH0315.bam -b -v

My decoy.bed:

NC_007605   1   171823
hs37d5  1   35477943

VariantBam log

Rules script: 
...building rules from command line
----------ReadFilterCollection-------------
--Exclude Region:  Matelink: ON 2 regions
  Rule: 

------------------------------------------
:159_CH0315-ready.bam
 - BamReader - Walking whole genome -
 ------------------------------------
...starting filtering
Read   1,000,000 at  1:3,474,064  . Kept          0 ( 0%) -- 
Read   2,000,000 at  1:7,052,903  . Kept          0 ( 0%) -- 
Read   3,000,000 at  1:10,626,387 . Kept          0 ( 0%) -- 
Read   4,000,000 at  1:14,535,078 . Kept          0 ( 0%) -- 
Read   5,000,000 at  1:17,530,458 . Kept          0 ( 0%) -- 
Read   6,000,000 at  1:21,025,873 . Kept          0 ( 0%) -- 
Read   7,000,000 at  1:24,629,625 . Kept          0 ( 0%) -- 
Read   8,000,000 at  1:28,201,716 . Kept          0 ( 0%) -- 
Read   9,000,000 at  1:31,898,856 . Kept          0 ( 0%) -- 
...
Read 827,000,000 at GL000199.1:83,070     . Kept          0 ( 0%) -- 
Read 828,000,000 at GL000224.1:18,796     . Kept          0 ( 0%) -- 
Read 829,000,000 at hs37d5:197,522    . Kept          0 ( 0%) -- 
Read 830,000,000 at hs37d5:1,438,308  . Kept          0 ( 0%) -- 
Read 831,000,000 at hs37d5:1,459,267  . Kept          0 ( 0%) -- 
Read 832,000,000 at hs37d5:1,751,730  . Kept          0 ( 0%) -- 
Read 833,000,000 at hs37d5:3,392,637  . Kept          0 ( 0%) -- 
Read 834,000,000 at hs37d5:4,406,614  . Kept          0 ( 0%) -- 
Read 835,000,000 at hs37d5:4,779,479  . Kept          0 ( 0%) -- 
Read 836,000,000 at hs37d5:7,156,117  . Kept          0 ( 0%) -- 
Read 837,000,000 at hs37d5:9,768,510  . Kept          0 ( 0%) -- 
Read 838,000,000 at hs37d5:12,375,461 . Kept          0 ( 0%) -- 
Read 839,000,000 at hs37d5:13,153,460 . Kept          0 ( 0%) -- 
Read 840,000,000 at hs37d5:15,227,612 . Kept          0 ( 0%) -- 
Read 841,000,000 at hs37d5:16,047,082 . Kept          0 ( 0%) -- 
Read 842,000,000 at hs37d5:16,656,255 . Kept          0 ( 0%) -- 
Read 843,000,000 at hs37d5:18,490,428 . Kept          0 ( 0%) -- 
Read 844,000,000 at hs37d5:19,036,270 . Kept          0 ( 0%) -- 
Read 845,000,000 at hs37d5:19,528,152 . Kept          0 ( 0%) -- 
Read 846,000,000 at hs37d5:19,855,497 . Kept          0 ( 0%) -- 
Read 847,000,000 at hs37d5:20,879,312 . Kept          0 ( 0%) -- 
Read 848,000,000 at hs37d5:21,394,761 . Kept          0 ( 0%) -- 
Read 849,000,000 at hs37d5:21,913,184 . Kept          0 ( 0%) -- 
Read 850,000,000 at hs37d5:22,570,559 . Kept          0 ( 0%) -- 
Read 851,000,000 at hs37d5:23,005,677 . Kept          0 ( 0%) -- 
Read 852,000,000 at hs37d5:24,071,798 . Kept          0 ( 0%) -- 
Read 853,000,000 at hs37d5:24,793,479 . Kept          0 ( 0%) -- 
Read 854,000,000 at hs37d5:25,863,618 . Kept          0 ( 0%) -- 
Read 855,000,000 at hs37d5:26,115,293 . Kept          0 ( 0%) -- 
Read 856,000,000 at hs37d5:26,532,712 . Kept          0 ( 0%) -- 
Read 857,000,000 at hs37d5:26,619,944 . Kept          0 ( 0%) -- 
Read 858,000,000 at hs37d5:27,040,616 . Kept          0 ( 0%) -- 
Read 859,000,000 at hs37d5:27,043,584 . Kept          0 ( 0%) -- 
Read 860,000,000 at hs37d5:27,395,331 . Kept          0 ( 0%) -- 
Read 861,000,000 at hs37d5:27,793,648 . Kept          0 ( 0%) -- 
Read 862,000,000 at hs37d5:28,310,687 . Kept          0 ( 0%) -- 
Read 863,000,000 at hs37d5:28,634,492 . Kept          0 ( 0%) -- 
Read 864,000,000 at hs37d5:29,154,535 . Kept          0 ( 0%) -- 
Read 865,000,000 at hs37d5:29,371,190 . Kept          0 ( 0%) -- 
Read 866,000,000 at hs37d5:29,914,751 . Kept          0 ( 0%) -- 
Read 867,000,000 at hs37d5:30,080,710 . Kept          0 ( 0%) -- 
Read 868,000,000 at hs37d5:31,358,220 . Kept          0 ( 0%) -- 
Read 869,000,000 at hs37d5:35,254,815 . Kept          0 ( 0%) -- 
Read 870,000,000 at Unmapped:-1         . Kept          0 ( 0%) -- 
Read 871,000,000 at Unmapped:-1         . Kept          0 ( 0%) -- 
Read 872,000,000 at Unmapped:-1         . Kept          0 ( 0%) -- 
Read 873,000,000 at Unmapped:-1         . Kept          0 ( 0%) -- 
Read 874,000,000 at Unmapped:-1         . Kept          0 ( 0%) -- 
Read 875,000,000 at Unmapped:-1         . Kept          0 ( 0%) -- 
Read 876,000,000 at Unmapped:-1         . Kept          0 ( 0%) -- 
Read 877,000,000 at Unmapped:-1         . Kept          0 ( 0%) -- 
Read 878,000,000 at Unmapped:-1         . Kept          0 ( 0%) -- 
Read 878,462,747 at Unmapped:-1         . Kept          0 ( 0%) -- 

In the result I have an empty bam file with a header but without any reads.

It looks like the reads on the decoy are not filtered and the results are not written in the result.bam file. Do you have any suggestion how to fix this issue? Am I doing something wrongly?

I installed the latest VariantBam from this github last week.

Thanks! Sergey

walaj commented 5 years ago

Thank you Sergey! Thanks for the bug report -- I indeed was able to recreate and found the issue. Basically just left off a step when constructing excluder regions from the command line.

I had to update the submodule SeqLib as well, so to update with the fix, you'd need to update the submodules as well.

naumenko-sa commented 5 years ago

Thanks so much, Jeremiah @walaj ! With the fix VariantBam works perfectly:

GL000192.1  547496  151293  12
NC_007605   171823  0   0
hs37d5  35477943    0   0

Sergey