Closed tonybendis closed 1 month ago
@tonybendis have you created a branch for this as of yet that I can poke around in?
@BMurri this looks like a long pending issue open since 2021, are we actively working on this or can this be closed?
@ngambani This is another one that would be great for someone in the community to pitch in on. It can help control costs with some types of workflows that don't need more cores but still need more storage space. microsoft/ga4gh-tes#454 in a crude way would do the same thing, but without as much cost savings nor the more precise control.
The Batch VM is currently selected based on resource requirements of the Cromwell task. The code selects the cheapest VM that has CPU count, memory and disk size that are (all three) equal or larger than requested. For tasks requiring large disk, this increases the cost of the VM because larger disk requirement will selected the VM that also has CPU/memory much higher that wanted.
Proposed solution: In light of the fact that TES now supports out of the box local NVMe drives which are found on more than just series L SKUs but the code does NOT use the NVMe drive presence in the selection code described above, the following is the new proposal:
GenerateBatchVmSkus
toolResourceDiskSizeInGiB
and NVMe drives total size for all SKUs where that value is >= the requested disk size, and add the additional cost of the requested disk size to the cost of the SKU for all other SKUs.ResourceDiskSizeInGiB
and NVMe drives total size, add a Standard LRS disk of the requested size to the pool spec and format and mount it to the same mount point as the current NVMe start-task.Estimated effort: 3 days
Note: Premium SSD is not available for all SKUs, Premium v2 is not available through Batch (same with Ultra). To simplify the implementation and maintain costs, Standard SSD was chosen.
Consider dividing the requested capacity by the number of additional drives that can be attached in order to pool them to improve disk I/O.
Previous proposed solution: