Open map2085 opened 7 years ago
Try using bamcat instead. This will not open all input files at the same time. If you want the output to be sorted then use
bamcat level=0 in1.bam in2.bam ... | bamsort
I understand.
This workaround would be very inefficient though, since it would have to re-sort all of the files after cat
, even though the files were pre-sorted, right?
You can try whether a multiple stage merge is faster, i.e. use bammerge to merge subsets, then merge the pre merged files. bammerge currently has no support for doing multiple stage merges directly.
yeah, I have implemented the multiple intermediate stage merge workaround. It's not difficult, but cumbersome and a nuisance. I just thought to post the notification here to alert everyone.
biobambam2
works great though!
I am working with very large data. Gzip FASTQ size = 250 GB . I split the FASTQ file into ~1,200 smaller FASTQ files. I aligned the 1,200 FASTQ files with BWA, standard parameters.
Now I am trying to merge the 1,200 small BAM files (~350 Mb each) with biobambam2.
Immediately upon calling
biobambam2 bammerge
, it fails with error message: "Too many open files"