Closed LucaCinquini closed 11 months ago
Note: this will not work across multiple AZs.
We could also try using a StatefulSet instead of a DaemonSet and mounting a specific EBS volume on each
This problem could actually be solved by increasing the size of the Verdi workers in the node group - for example to 500GB.
Drew to try next... Also we should check that the disk space is reclaimed once the job is finished, otherwise the node will run out of space eventually (or soon). If this does not already happen, we should modify the workflow so that it does.
We should make the size of the disk a tfvar.
Merged into main.
In order to possibly speed up the stage-in and stage-out portions of the CHIRP workflow, Mike asked that we consider using a common EBS disk as opposed to the current EFS disk. The AWS feature is described here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html
For now, the EBS volume should replace the EFS partition (use a Terraform flag to opt for one or the other) and should be mounted at the same "/stage/" location as the EFS, in both containers of the Verdi pod, so that nothing else in the cluster needs to change.