unity-sds / unity-sps-prototype

Apache License 2.0
2 stars 7 forks source link

[New Feature]: Investigate using a multi-node EBS disk #237

Closed LucaCinquini closed 11 months ago

LucaCinquini commented 1 year ago

In order to possibly speed up the stage-in and stage-out portions of the CHIRP workflow, Mike asked that we consider using a common EBS disk as opposed to the current EFS disk. The AWS feature is described here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html

For now, the EBS volume should replace the EFS partition (use a Terraform flag to opt for one or the other) and should be mounted at the same "/stage/" location as the EFS, in both containers of the Verdi pod, so that nothing else in the cluster needs to change.

LucaCinquini commented 1 year ago

Note: this will not work across multiple AZs.

LucaCinquini commented 1 year ago

We could also try using a StatefulSet instead of a DaemonSet and mounting a specific EBS volume on each

LucaCinquini commented 1 year ago

This problem could actually be solved by increasing the size of the Verdi workers in the node group - for example to 500GB.

LucaCinquini commented 12 months ago

Drew to try next... Also we should check that the disk space is reclaimed once the job is finished, otherwise the node will run out of space eventually (or soon). If this does not already happen, we should modify the workflow so that it does.

LucaCinquini commented 11 months ago

We should make the size of the disk a tfvar.

LucaCinquini commented 11 months ago

Merged into main.