rjpbonnal / bioruby-samtools

Porting of samtools-ruby to BioRuby. Binder of samtools for ruby, on the top of FFI -from original project-
Other
33 stars 23 forks source link

View and fetch commands don't retrieve region correctly. #51

Closed MatthewRalston closed 8 years ago

MatthewRalston commented 9 years ago

When running the fetch method or view method, the samtools command is malformed, returning all the reads in a file.

~/sandbox >gem list | grep bio-samtools
bio-samtools (2.3.3)
~/sandbox >samtools view spec/test_files/test.bam NC_001988.2:75-75
HWI-ST741:439:H9G3UADXX:2:2103:18990:5489_2:N:0:ATCACG  153 NC_001988.2 7   0   8M2D68M =   7   0   TTGATGGATATAGCATATTTAGGAGGTATTTAAAATGAAAGAATATAAATATACTGTTATTACAGGAGCAAGTACG    JIIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJHJJJJJJJJJJHHHHHFFFFFBCC    AS:i:-39    XN:i:0  XM:i:5  XO:i:1  XG:i:2  NM:i:7  MD:Z:0A1A3C1^AT65T1A    YT:Z:UP
HWI-ST741:441:H9DFHADXX:1:2214:9303:82237_2:N:0:ATCACG  145 NC_001988.2 10  42  76M =   191913  191979  ATGCAATTATAGCATATTTAGGAGGTATTTAAAATGAAAGAATATAAATATACTGTTATTACAGGAGCAAGGTCAG    GHGEGIJJIJHIIIGJJJJJJJIJIIGGIIJJJIHHHEGDGICIIGIEIIIJIJIHIJGGGIJHHHBFFFFDDB@@    AS:i:-5 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:71T4   YS:i:0  YT:Z:DP
HWI-ST741:439:H9G3UADXX:2:1202:12969:64751_2:N:0:ATCACG 145 NC_001988.2 72  8   76M =   79  83  CCGATCTAGTTCAGGGATAGGATATGAGGCTGCAAAGGCATTTGCAAAAAGAGGAAAAAATTTAATTATTATTGCC    FFIIIIJJIIIJIHIIJJHIGIHGIIJIIJIDIIHEHGHEIIIIJIJJJJJIHFIJJJJJJJJHHHHHEDDFFC@@    AS:i:-27    XN:i:0  XM:i:5  XO:i:0  XG:i:0  NM:i:5  MD:Z:0A0G2G1A68T0   YS:i:-25    YT:Z:DP
HWI-ST741:439:H9G3UADXX:2:1215:16148:36912_2:N:0:ATCACG 145 NC_001988.2 72  8   76M =   79  83  CCGATCTAGTTCAGGGATAGGATATGAGGCTGCAAAGGCATTTGCAAAAAGAGGAAAAAATTTAATTATTATTGCC    EGHHGGHGHHGGHFFGGFEGGFIHGDDCEGJIGHDGIJHFDGIHEIHHABDEJIEGGIJHFGGBBDFFFEDDD?@@    AS:i:-25    XN:i:0  XM:i:5  XO:i:0  XG:i:0  NM:i:5  MD:Z:0A0G2G1A68T0   YS:i:-25    YT:Z:DP
HWI-ST741:439:H9G3UADXX:2:2215:17172:90892_2:N:0:ATCACG 145 NC_001988.2 72  8   76M =   79  83  CCGATCTAGTTCAGGGATAGGATATGAGGCTGCAAAGGCATTTGCAAAAAGAGGAAAAAATTTAATTATTATTGCC    EHAEGIIJIJJJJJJJJJJIIGIIIGGHHIJJJJJIGFIGJIIIIIJJJIF@JJIIHGIIIHGHHHFHEBDFD@@@    AS:i:-26    XN:i:0  XM:i:5  XO:i:0  XG:i:0  NM:i:5  MD:Z:0A0G2G1A68T0   YS:i:-24    YT:Z:DP
HWI-ST741:441:H9DFHADXX:1:2208:19865:81444_2:N:0:ATCACG 145 NC_001988.2 72  42  76M =   191874  191846  AGGAGCAAGTTCAGGGATAGGATATGAGGCTGCAAAGGCATTTGCAAAAAGAGGAAAAAATTTAATTATTATTGCC    CEC;>EAD>GDFFDAFF=8*?D<DBF?293D?FGFB@9GFFCBFE@IIFHBF4FIFHHE9AFIFFDFFDDD=+=1=    AS:i:-4 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:75T0   YS:i:0  YT:Z:DP
HWI-ST741:441:H9DFHADXX:1:2108:13040:98530_1:N:0:ATCACG 99  NC_001988.2 75  42  76M =   83  84  AGCAAGTTCAGGGATAGGATATGAGGCTGCAAAGGCATTTGCAAAAAGAGGAAAAAATTTAATTATTATTGCTAGA    @CCFFFEFFHHGFGIIIJIDHHHGGEGIEHIJIGIIGIHGIIIFDGIJIGFHGIJJJJJIJHHIJHFHEFBEDFFF    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YS:i:0  YT:Z:CP
 git:master| ~/sandbox >samtools view spec/test_files/test.bam NC_001988.2:75-75 | wc -l
       7
~/sandbox >irb
2.2.1 :001 > require 'bio-samtools'
 => true
2.2.1 :002 > bam=Bio::DB::Sam.new(:bam => "spec/test_files/test.bam", :fasta => "spec/test_files/test.fa")
 => #<Bio::DB::Sam:0x007f802319a668 @fasta="spec/test_files/test.fa", @bam="spec/test_files/test.bam", @samtools="/Users/Matthew/.rvm/gems/ruby-2.2.1@SCI/gems/bio-samtools-2.3.3/lib/bio/db/sam/external/samtools", @bcftools="/Users/Matthew/.rvm/gems/ruby-2.2.1@SCI/gems/bio-samtools-2.3.3/lib/bio/db/sam/external/bcftools", @last_command=nil>
2.2.1 :003 > reads = []
 => []
2.2.1 :004 > bam.fetch("NC_001988.2",75,75) {|x| reads << x}
 => #<Process::Status: pid 66134 exit 0>
2.2.1 :005 > reads.size
 => 36
2.2.1 :006 > reads = []
2.2.1 :007 > bam.view(chr: "NC_001988.2", start: 75, stop: 75) {|x| reads << x}
 => #<Process::Status: pid 66155 exit 0>
2.2.1 :008 > reads.size
 => 36
~/sandbox >samtools view spec/test_files/test.bam | wc -l
      36
danmaclean commented 9 years ago

Hi Matt, Im not sure I can replicate the problem. I am getting back the Bio::DB::Alignment objects I expect. Can you check the contents of reads and let me know what they are. Are you saying that fetch just spits out a file? I don't understand what you're expecting to see in reads

$ gem list | grep bio-samtools bio-samtools (2.3.3, 2.2.0) $ irb 2.1.5 :001 > require 'bio-samtools' => true 2.1.5 :002 > bam=Bio::DB::Sam.new(:bam => '/Users/macleand/.rvm/gems/ruby-2.1.5/gems/bio-samtools-2.3.3/test/samples/small/sorted.bam', :fasta => '/Users/macleand/.rvm/gems/ruby-2.1.5/gems/bio-samtools-2.3.3/test/samples/small/test_chr.fasta') => #<Bio::DB::Sam:0x007fad3aa729b8 @fasta="/Users/macleand/.rvm/gems/ruby-2.1.5/gems/bio-samtools-2.3.3/test/samples/small/test_chr.fasta", @bam="/Users/macleand/.rvm/gems/ruby-2.1.5/gems/bio-samtools-2.3.3/test/samples/small/sorted.bam", @samtools="/Users/macleand/.rvm/gems/ruby-2.1.5/gems/bio-samtools-2.3.3/lib/bio/db/sam/external/samtools", @bcftools="/Users/macleand/.rvm/gems/ruby-2.1.5/gems/bio-samtools-2.3.3/lib/bio/db/sam/external/bcftools", @last_command=nil> 2.1.5 :003 > reads = [] => [] 2.1.5 :004 > bam.fetch("chr_1", 100, 150) {|x| reads << x} => #<Process::Status: pid 5687 exit 0> 2.1.5 :005 > reads.length => 2 2.1.5 :006 > puts reads

Bio::DB::Alignment:0x007fad3b07a8b0

Bio::DB::Alignment:0x007fad3b079ed8

=> nil 2.1.5 :007 >

MatthewRalston commented 9 years ago

I expect to see 7 reads/Alignment objects, like the samtools view command with the same arguments produces above. Instead, all the reads in the file are passed to the block. I don't observe the same problem in 2.3.2:

~/sandbox >gem list | grep bio-samtools
bio-samtools (2.3.2)
~/sandbox >irb
2.1.2 :001 > require 'bio-samtools'
 => true
2.1.2 :002 > reads=[]
 => []
2.1.2 :003 > bam=Bio::DB::Sam.new(:bam => "spec/test_files/test.bam", :fasta => "spec/test_files/test.fa")
 => #<Bio::DB::Sam:0x007f80910e6e88 @fasta="spec/test_files/test.fa", @bam="spec/test_files/test.bam", @samtools="/Users/Matthew/.rvm/gems/ruby-2.1.2@SCI/gems/bio-samtools-2.3.2/lib/bio/db/sam/external/samtools", @bcftools="/Users/Matthew/.rvm/gems/ruby-2.1.2@SCI/gems/bio-samtools-2.3.2/lib/bio/db/sam/external/bcftools", @last_command=nil>
2.1.2 :004 > bam.fetch("NC_001988.2",75,75) {|x| reads << x}
 => #<Process::Status: pid 95802 exit 0>
2.1.2 :005 > reads.size
 => 7

But when I uninstall 2.3.2 and switch to 2.3.3 i see the following:

~/sandbox >gem uninstall bio-samtools
Remove executables:
    bam_consensus.rb

in addition to the gem? [Yn]  y
Removing bam_consensus.rb
Successfully uninstalled bio-samtools-2.3.2
~/sandbox >gem install bio-samtools
Fetching: bio-samtools-2.3.3.gem (100%)
Building native extensions.  This could take a while...
Successfully installed bio-samtools-2.3.3
1 gem installed
~/sandbox >irb
2.1.2 :001 > reads=[]
 => []
2.1.2 :002 > require 'bio-samtools'
 => true
2.1.2 :003 > bam=Bio::DB::Sam.new(:bam => "spec/test_files/test.bam", :fasta => "spec/test_files/test.fa")
 => #<Bio::DB::Sam:0x007fad42a11750 @fasta="spec/test_files/test.fa", @bam="spec/test_files/test.bam", @samtools="/Users/Matthew/.rvm/gems/ruby-2.1.2@test/gems/bio-samtools-2.3.3/lib/bio/db/sam/external/samtools", @bcftools="/Users/Matthew/.rvm/gems/ruby-2.1.2@test/gems/bio-samtools-2.3.3/lib/bio/db/sam/external/bcftools", @last_command=nil>
2.1.2 :004 > bam.fetch("NC_001988.2",75,75) {|x| reads << x}
 => #<Process::Status: pid 2067 exit 0>
2.1.2 :005 > reads.size
 => 36
2.1.2 :006 > puts reads
#<Bio::DB::Alignment:0x007fad4403f9a0>
#<Bio::DB::Alignment:0x007fad4403eed8>
#<Bio::DB::Alignment:0x007fad4403eaa0>
#<Bio::DB::Alignment:0x007fad4403dee8>
#<Bio::DB::Alignment:0x007fad4403d330>
#<Bio::DB::Alignment:0x007fad4403c778>
#<Bio::DB::Alignment:0x007fad4404fb70>
#<Bio::DB::Alignment:0x007fad4404efb8>
#<Bio::DB::Alignment:0x007fad4404e400>
#<Bio::DB::Alignment:0x007fad4404d848>
#<Bio::DB::Alignment:0x007fad4404cc90>
#<Bio::DB::Alignment:0x007fad4404c0d8>
#<Bio::DB::Alignment:0x007fad4210f4e0>
#<Bio::DB::Alignment:0x007fad4210e928>
#<Bio::DB::Alignment:0x007fad4210dd70>
#<Bio::DB::Alignment:0x007fad4210d1b8>
#<Bio::DB::Alignment:0x007fad4210c6f0>
#<Bio::DB::Alignment:0x007fad42117af0>
#<Bio::DB::Alignment:0x007fad42116f38>
#<Bio::DB::Alignment:0x007fad42116380>
#<Bio::DB::Alignment:0x007fad421157c8>
#<Bio::DB::Alignment:0x007fad42114c10>
#<Bio::DB::Alignment:0x007fad42114148>
#<Bio::DB::Alignment:0x007fad44057550>
#<Bio::DB::Alignment:0x007fad44056a88>
#<Bio::DB::Alignment:0x007fad44055ed0>
#<Bio::DB::Alignment:0x007fad44055318>
#<Bio::DB::Alignment:0x007fad44054760>
#<Bio::DB::Alignment:0x007fad4211fc50>
#<Bio::DB::Alignment:0x007fad4211f188>
#<Bio::DB::Alignment:0x007fad4211e6c0>
#<Bio::DB::Alignment:0x007fad4211db08>
#<Bio::DB::Alignment:0x007fad4211cf50>
#<Bio::DB::Alignment:0x007fad4211c398>
#<Bio::DB::Alignment:0x007fad4405f778>
#<Bio::DB::Alignment:0x007fad4405ebc0>

It's worth noting that none of the 36 items in the list (the total number of reads in the file) are of type NilClass.

homonecloco commented 9 years ago

Can you tell us which version of samtools you have in your system? We compile the version 0.1.19, and the results may vary according to the default options in samtools. The library should be consistent, as it downloads 0.1.19 (we haven't updated to 1.x because some functionality is still missing in the new library). I had a look at the code in the unit test, and it expect 36 reads, I'm now wondering why it was displaying 7 before. So, it could be some of the default quality filters (I think it is the option -A which displays all the reads). But I need to investigate more.

MatthewRalston commented 9 years ago

I'll have to get back to you when I'm home from work to provide additional specifics. @homonecloco you're not using the samtools version I have in my PATH variable, correct? You're likely using the one installed with the library which would be 0.1.19. 7 reads is the correct number as the samtools output at the top shows... The other 29 reads do not overlap with base 75 in the reference sequence I'm working with and thus 36 alignment objects is not correct. Is the correct output supposed to be 29 NilClass objects and 7 Alignment class objects? Or should it be the 7 reads that the 2.3.2 call produces? The test bam file from my gem's specs is linked at the top of this thread; the same folder has the fasta file and the index.

homonecloco commented 9 years ago

Hi @MatthewRalston , just added a unit test (not released as in the gem yet) and I don't seem to be able to reproduce the error. I agree with you that you should have 7 reads overlapping in the position (I'm sorry for my previous message, I wasn't in the computer and was trying to get the error just from reading the code). The correct output, in this case would be 7 alignment objects, with starting positions [7,10,72,72,72,72,75]. Best, Ricardo.

MatthewRalston commented 9 years ago

Thanks @homonecloco I'll see if I can get your tests to pass in a fresh gemset when I get home :)