Closed twbattaglia closed 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I would definitely support this. The key logic of nextflow is a little challenged on the cloud: unless one has a shared disk which can be mounted by all tasks VMs, each task will copy back and forth files to/from the bucket instead of using sym links as on-prem. This behaviour huuuugely multiplies costs by increasing both I/O and runtime. The possibility of specifying the disk type could change the IOPS of the VMs and improve performance on worker VMs. This feature would help optimizing nextflow pipelines on the cloud. quite important :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bump
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
We should be able to support this feature for both google life sciences and google batch. I think the best way to support it in Nextflow would be to add a DiskResource
class so that the disk type can be specified in the disk
directive, like with accelerator
. I have laid the groundwork for this in #3027, so when we merge that PR then I can implement the disk type.
+1, would be very useful for tasks like fasterq-dump
Google Batch does support SDD disk when using Fusion file system. See here
https://www.nextflow.io/docs/edge/google.html#fusion-file-system
Support for disk type was added to Google Batch in #3861 . We aren't really adding new features to the google-lifesciences executor because we encourage users to migrate to Google Batch, so I'm going to close this issue.
New feature
Ability to specify the Compute Engine disk type (pd-standard or local-SSD) found in the new Cloud Life Sciences API (https://cloud.google.com/life-sciences/docs/reference/rpc/google.cloud.lifesciences.v2beta#disk).
Usage scenario
Job's that require a high input/output operations per second and lower latency (https://cloud.google.com/compute/docs/disks/local-ssd).
Suggest implementation
The API documentation states it can be set using
setType()
(https://developers.google.com/resources/api-libraries/documentation/genomics/v1alpha2/java/latest/com/google/api/services/genomics/model/Disk.html#setType-java.lang.String-)Add disk type during formation of VM in GoogleLifeSciencesHelper.groovy
Where
req.diskType
is specified in GoogleLifeSciencesTaskHandler.groovygetDiskType()
can be set within TaskConfig.groovy, where it is set topd-standard
by default.Preliminary tests showed it was successful to generate a Computer Engine instance with SSD attached.