Closed andrewjpage closed 3 months ago
/usr/bin/time
returned this information for a large (this sample has cleaned reads with file sizes of 600MB):
debconf: delaying package configuration, since apt-utils is not installed
Command being timed: "run_midas.py species samplename -1 R1.fastq.gz -2 R2.fastq.gz -d db/midas_db_v1.2/ -t 4"
User time (seconds): 2873.56
System time (seconds): 20.72
Percent of CPU this job got: 238%
Elapsed (wall clock) time (h:mm:ss or m:ss): 20:12.82
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 669376
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 147
Minor (reclaiming a frame) page faults: 253564
Voluntary context switches: 791492
Involuntary context switches: 408572
Swaps: 0
File system inputs: 4157936
File system outputs: 170784
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
CPU utilization, if evenly split, is about 60% which is decent. Memory is underutilized, so I'm reducing that to 4.
:bug:
:pencil: Describe the Issue
The midas task requests a huge amount of RAM (32GB). This is an expensive VM configuration, particularly now that its run automatically from TheiaProk.
Run the task and figure out how much RAM is normally used (/usr/bin/time -v cmd). Does it really need 32GB? Does it require a 100GB local disk? How much processing time is used and calculate the utilisation of the CPUs? If its not making use of all 4, adjust the task to use less. I it is using all 4, consider bumping it to 8 to reduce the amount of time we are using 32GB RAM. Is this task IO bound?
This task is short. Make it preemptible (spot) so that we can access lower pricing. Google give 30 seconds notice before killing it, so we shouldn't notice any difference.
In the runtime section of the task set: