Closed szilvajuhos closed 3 years ago
I added splitting fastq to the list which would be a good improvement as discussed on the slack channel with RationalTangle
Hi, this is the latest Grafana plot about CPU usage I have managed to get from a latest full-blow run of a 90x/90x test set. It takes 4 days and 8 hours:
nextflow run nf-core/sarek -r dev -profile munin --custom_config_base 'https://raw.githubusercontent.com/MaxUlysse/nf-core_configs/MUNIN' --tools Manta,Strelka,HaplotypeCaller,Mutect2,ControlFREEC,ASCAT,snpEff,VEP,merge --monochrome_logs --genome GRCh38 --noGVCF --annotation_cache --snpEff_cache /data1/cache/snpEff --vep_cache /data1/cache/VEP --species homo_sapiens --max_cpus 48 --input ../fastq/swid.tsv
The dips are due to
Other short dips are due to some gather steps (merging BAMs, stats, whatever). These are short and can not be parallelized. But resolving the mentioned ones we should go down to 3 days or so with a complete run.
Using the "old" 2.3 release version of Sarek on a 48-core node with +700G memory, the preprocessing of a 45x/45x tumour/normal WGS pair took 2d 1h 11m 24s - that is actually pretty good. OTOH, there are pretty long parts using only a single CPU Grafana showing on Munin CPU utilisation graph I know @MaxUlysse already managed to speed up recalibration, would be nice to
EDIT: add splitting fastq files