pezmaster31 / bamtools

C++ API & command-line toolkit for working with BAM data
MIT License
418 stars 153 forks source link

error when combining merge and sort with pipe #25

Open fyfy opened 13 years ago

fyfy commented 13 years ago

Hi, dear Derek: I am using bamtools utility programs to merge multiple files and then sort them by coordinates. I can do it in two separated steps without any problem. However, if I put them in one step as: bamtools merge -in in_file1 -in in_file2 | bamtools sort -out out_file,

I got many error messages as: BgzfStream ERROR: unable to open file .sort.temp.0 BamReader ERROR: Could not open BGZF stream for .sort.temp.0 BamMultiReader WARNING: Could not open .sort.temp.0, ignoring file BgzfStream ERROR: unable to open file .sort.temp.1 BamReader ERROR: Could not open BGZF stream for .sort.temp.1 BamMultiReader WARNING: Could not open .sort.temp.1, ignoring file .... BgzfStream ERROR: read block failed - invalid block header BgzfStream ERROR: read block failed - invalid block header BgzfStream ERROR: read block failed - invalid block header BgzfStream ERROR: read block failed - invalid block header BgzfStream ERROR: could not decompress block - zlib::inflate() failed BgzfStream ERROR: read block failed - could not decompress block data BgzfStream ERROR: could not decompress block - zlib::inflate() failed BgzfStream ERROR: read block failed - could not decompress block data

Could you please check it?

Thanks.

Fan

pezmaster31 commented 13 years ago

With those error messages, it looks like an older version of BT. If so, can you update and try again? Then let me know how it looks.

fyfy commented 13 years ago

Dear Derek: Thank you so much for the quick reply. The problem I reported before is from version 1.0.6. I just compiled v2.0.0 and it generated a single error message: bamtools sort ERROR: could not open BamMultiReader for merging temp files... Aborting.

I guess these errors from 1.0.6 or 2.0.0 are related to the temp files used in sorting algorithm when piping is used. If either the merged BAM file is small, or the input for sort is from a BAM file instead of a pipe, the sorting is OK.

Thanks.

Fan

pezmaster31 commented 13 years ago

Try using a large value for the -n option. If the sort tool is generating a lot of temp files (>1K), you can hit a OS-defined file handle limit (often 1024). Setting a large -n value will up the number of alignments per temp file, reducing the number of those, and you may be OK.

Let me know how this goes.

pezmaster31 commented 13 years ago

ps - I plan to add a multi-pass system for merging in these large-file cases, which should alleviate the need for users guessing at the -n value until it works. Unfortunately, I just haven't gotten around to implementing it.

fyfy commented 13 years ago

Dear Derek: I tried -n option and it worked with high memory used (~30GB) if n is large enough. So, I have to try different value to avoid the error or hard drive swapping since some node do not have enough physical memory. The samtools sort function does not have such problem.

Thanks.

Fan

earonesty commented 12 years ago

The problem for me is avoided by modifying bamtools to use a tmpnam library (maybe after setenv TMP to '.') to a) create files in the current dir (not tmp), and b) guarantee uniqueness. Bamtools is the only bam sorter that works within a pipeline... instead of using double the disk space with samtools sort. So it's good to get it to work.