theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
37 stars 17 forks source link

ksnp3 doesn't have CPUs passed in #487

Closed andrewjpage closed 3 months ago

andrewjpage commented 4 months ago

:bug:

:pencil: Describe the Issue

The ksnp3 task doesnt take CPUs as input. Check what happens. Does it use all available CPUs?
This is the most expensive task to run currently ($30.80 per 1000 runs).

Run the task and figure out how much RAM is normally used (/usr/bin/time -v cmd). Does it really need 8GB? Does it require a 100GB local disk? How much processing time is used and calculate the utilisation of the CPUs? If its not making use of all 4, adjust the task to use less.

cimendes commented 4 months ago

Test on HAV genome dataset (n=5)

        User time (seconds): 1.65
        System time (seconds): 0.14
        Percent of CPU this job got: 3%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:50.70
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 59228
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 25098
        Voluntary context switches: 420
        Involuntary context switches: 23
        Swaps: 0
        File system inputs: 8
        File system outputs: 104
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Test on K. pneumoniae dataset (n=5)

        User time (seconds): 2.63
        System time (seconds): 0.26
        Percent of CPU this job got: 0%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 5:19.23
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 58984
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 25137
        Voluntary context switches: 2287
        Involuntary context switches: 125
        Swaps: 0
        File system inputs: 0
        File system outputs: 112
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Mix bacterial dataset (n=10)

Command being timed: "miniwdl run --task ksnp3 /home/ines_mendes/Git/public_health_bioinformatics/tasks/phylogenetic_inference/task_ksnp3.wdl assembly_fasta= /home/ines_mendes/Test/klebsiella/ERR3671149_contigs.fasta samplename=ERR3671149 assembly_fasta= /home/ines_mendes/Test/klebsiella/INF001_contigs.fasta samplename=INF001 assembly_fasta= /home/ines_mendes/Test/klebsiella/INF088_contigs.fasta samplename=INF088 assembly_fasta= /home/ines_mendes/Test/klebsiella/INF100_contigs.fasta samplename=INF100 assembly_fasta= /home/ines_mendes/Test/klebsiella/klebsiella_pneumoniae_st111_NZ_CP013711.fasta samplename=NZ_CP013711 assembly_fasta= /home/ines_mendes/Test/a_baumannii/NZ_CP043953.fasta samplename=NZ_CP043953 assembly_fasta= /home/ines_mendes/Test/a_baumannii/NZ_CP072398.fasta samplename=NZ_CP072398 assembly_fasta= /home/ines_mendes/Test/a_baumannii/SRR23620612_contigs.fasta samplename=SRR23620612 assembly_fasta= /home/ines_mendes/Test/enterococcus_faecium/ERR2407653_contigs.fasta samplename=ERR2407653 assembly_fasta= /home/ines_mendes/Test/vibrio/ERR10146580_contigs.fasta samplename=ERR10146580 cluster_name=mix_samples"
        User time (seconds): 4.26
        System time (seconds): 0.35
        Percent of CPU this job got: 0%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 9:52.20
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 60536
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 25
        Minor (reclaiming a frame) page faults: 25252
        Voluntary context switches: 4260
        Involuntary context switches: 183
        Swaps: 0
        File system inputs: 3248
        File system outputs: 176
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0