Open spikeliu opened 6 years ago
It works quite nicely. See https://github.com/ablab/spades/issues/67#issuecomment-359267438 for 3 shell lines and an example performance on LustreFS. Best is to have a ramdisk on the machine, write the sorted BAM file to it and then, move the final BAM with its index to the real storage filesystem.
The sortthreads option should work. I will add it to the documention in the next version.
So how does sortthreads
relate to inputthreads
and outputthreads
if I use all three on the commandline? In which ratio should I distribute the available cores in between these three?
@mmokrejs: the issue with bamsort is that it does not use pooled threading throughout the program. The input, output and sortthreads may run all at the same time. You can use a tool like cpuset to limit the number of real cores used by the program and set all three to that number of threads. If you want a program that will use exactly a given number of threads for processing at any time, then please check bamsormadup, it was designed for this.
@gt1 You say that if I run bamsort sortthreads=$phys_cores inputthreads=$phys_cores outputthreads=$phys_cores
that I will end-up with load 300?
Shall I divide the numbers of available cores by 3 to ensure the load will be max 100?
But isn't one decompression and one compression thread enough? So bamsort sortthreads=$phys_cores-2 inputthreads=1 outputthreads=1
?
@mmokrejs This could happen, although it is rather unlikely. In my experience, assuming you do not set level=0 for uncompressed output, the output compression is rather compute heavy, so you might want to spend more threads there.
For "bamsort" command, I cannot find an option for using multiple threads in sorting process (I don't think the inputthreads and outputthreads are for this kind of purpose). And when I look at the source code, I can see an option "KEY" as "sortthreads" which doesn't show up in the --help information. I wonder if I can use it or is there any reason that you hide it. Maybe it is not ready to be used because it can cause some error?