vdblab / vdblab-shotgun

Shotgun metagenomic sequencing processing pipeline
MIT License
1 stars 1 forks source link

WIP trimming resources #88

Closed nickp60 closed 5 months ago

nickp60 commented 5 months ago

Now that we have ~5k samples processed, I wanted to see what our resource utilization looked like to see if there were opportunities for trimming the requested resources. I focused on the preprocessing and MetaPhlan/Humann workflows for the time being. Here is what I found:

Screen Shot 2024-05-20 at 4 24 44 PM Screen Shot 2024-05-20 at 4 25 49 PM Screen Shot 2024-05-20 at 4 25 29 PM

Metaphlan

Metaphlan tends to use ~25GB memory, and complete in less than 30 minutes; the cluster was requesting 1gb per core, so 64GB memory, resulting is a median less than 40% utilization. Here, I reduced the memory to 30GB, and the runtime to 2hr, and the number of cores 32. Both memory and runtime scale with subsequent attempts.

Humann

Humann was similarly underutilizing memory. Most ran in under three hours, but some took upwards of 12. The default was changed to 8, scaling with attempt. Memory was previously specified as 10gb per combined input gb; as our was rarely above 50%, I changed the default to 5GB per GB of input, scaling with attempt as well.

Deduplication with BBMap Clumpify

Memory was previously specified as 16Gb per input GB; utilization appears to be normally less than 60%, so I decreased this to 10GB, scaling with attempt. Runtime rarely exceeds an hour, so I changed the default to that, scaling with attempt squared.

SortMeRNA

Memory was specified for 16GB, and the median utilization was around 10%. Updated to 6gb by default, scaling with attempt. Median runtime was under a half hour; I changed the default to 2 hr scaling with the cubed attempt.

Snap

Similar to humann, snap was given 10GB per input GB. This rule is operated per shard. This resulted in less than 20% utilization. Memory is related to the reference, so it looks like we can set this to 10GB scaled with attempt. Runtime never exceeded an hour, the default is 90 minutes, scaled with attempt, and I dropped from 24 to 16 threads for (hopefully) faster queueing.

Bowtie2

Similar to snap, but slightly more memory needed so set to 12GB with scaling by attempt, and starting the runtime at an hour

nickp60 commented 5 months ago

I'm merging this hot for now, and merging into main so we can patch the version isabl is using