Closed ne1s0n closed 3 years ago
I wanted to add that I've tested the command and the resulting mapped.bed file is identical to the one obtained via bedtools.
Thanks for this. Would you confirm if this is still true with the newest version of bedtools? Also, if you could provide just a couple of benchmarks that would be useful.
Thanks!
@jpuritz @ne1s0n I can try to benchmark this over the weekend
@jpuritz I'll still run the bencharks, but here's the benchmarks provided in the bedops docs for the comparison btwn bedtools:
@jpuritz Here is a real-world benchmark:
I had to cut this off prematurely b/c it used 100% RAM and 100% SWAP
> bedtools merge -i cat-RRG.bam -bed > tmp.mapped.bed
3671.80s user
1484.30s system
2:06:45.18 total time elapsed
67% cpu
254330 kb memory
141880280 file input operations
31296 file output operations
The conversion to bed
> time bedtools bamtobed -i cat-RRG.bam > cat-RRG.bed
4443.79s user
241.44s system
1:35:28.61 total time elapsed
81% cpu
47 kb memory
161421928 file input operations
313462296 file output operations
The merge
> time bedops --merge cat-RRG.bed > mapped.bed
742.32s user
76.32s system
15:33.65 total time elapsed
87% cpu
5 kb memory
214871480 file input operations
35216 file output operations
So by comparison: | tool | time | peak ram |
---|---|---|---|
bedtools | 2hrs+ | 254gigs + | |
bedops | 1hr 50m | 52kb |
I found that the command
bedtools merge -i cat-RRG.bam -bed > mapped.bed
uses a lot of memory and often gets the pipeline killed. An efficient alternative is to use the bedops suite, and in particular substitute the previous code with:
bedtools bamtobed -i cat-RRG.bam > cat-RRG.bed
bedops --merge cat-RRG.bed > mapped.bed
This approach uses a trivial amount of memory. It requires the extra bam -> bed transformation, but it allowed me to run the pipeline on a previously unavailable system. Since it is an extra tool, it would require an update in the installation instruction. Fortunately bedops is in conda, so
conda install bedops
is enough. The problematic command is present in two places in the dDocent code:
https://github.com/jpuritz/dDocent/blob/9718247b7f533a71057787d77c5232b6b97065c5/dDocent#L407 https://github.com/jpuritz/dDocent/blob/9718247b7f533a71057787d77c5232b6b97065c5/dDocent#L1197