I run megahit on a soil metagenome dataset, but only on 1 out of the 60 samples to test speed The dataset contains PE reads of each 4.7Gbp (32M reads each) and one single read file (PEs that were able to be merged) of 13.7 Gbp (60M reads). I am running this on a server with 1TB of RAM and 96 CPUs. Around 4% of the memory and most CPUs are used when monitored via "top". It is running with --presets meta-large as recommended for soil metagenomes.
It runs now for 48 hours and is far away from being done. This seems to me too long given the benchmarks mentioned in the paper. What run times do I need to expect given the available computer power and the size and nature of the sequence files? Can I somehow boost performance? Any help here is greatly appreciated.
I run megahit on a soil metagenome dataset, but only on 1 out of the 60 samples to test speed The dataset contains PE reads of each 4.7Gbp (32M reads each) and one single read file (PEs that were able to be merged) of 13.7 Gbp (60M reads). I am running this on a server with 1TB of RAM and 96 CPUs. Around 4% of the memory and most CPUs are used when monitored via "top". It is running with --presets meta-large as recommended for soil metagenomes.
It runs now for 48 hours and is far away from being done. This seems to me too long given the benchmarks mentioned in the paper. What run times do I need to expect given the available computer power and the size and nature of the sequence files? Can I somehow boost performance? Any help here is greatly appreciated.
Kind regards, Martin