Open nstogner opened 1 year ago
I wanted to get a start on understanding GCSFuse equivalents. It turns out AWS makes it difficult or charges for the privilege of using S3 as a file system! S3 notably doesn't make an appearance on the k8s-csi supported drivers list. Here's what I came away with:
Checkpointing models, logging nb outputs, saving datasets - I believe these are all constitute sequential write operations but it's hard to know for sure. This blog post outlines how to use it within an EKS context: https://dev.to/otomato_io/mount-s3-objects-to-kubernetes-pods-12f5
Users of s3fs-fuse seem to have in part migrated toward goofys due to lack of support.
On the AWS supported side, the EKS module makes adding the fsx-lustre-csi-driver easy. This seems like the closest thing to gcsfuse but closer inspection on the pricing page has me think this runs counter to our project goals:
File system storage: You pay for the average amount of storage provisioned for your file systems per month, measured in gigabyte-months "GB-months," as shown in the pricing examples.
That seems like a non-starter for substratus.
I come away thinking yandex-cloud/k8s-csi-s3 and kahing/goofys are the best contenders and warrant a closer look. Rclone has no officially supported csi driver (unofficial here). That may be sufficient to give it a test in the bake-off.
Worth mention: As a totally different option I all but discarded, the CSI driver for EBS and EFS exist but I think we should stick to blob storage.
I wonder if the first non-POSIX behavior outlined is a dealbreaker or not
GCS Fuse is non-POSIX: https://cloud.google.com/storage/docs/gcs-fuse#expandable-1
One more (abandoned) CSI driver project for reference: https://github.com/ctrox/csi-s3/tree/master
And yet another: https://github.com/gaul/s3proxy (I think I closed it instantly when I saw it was java)
Something else to consider that would unblock any environment is to include a K8s native storage provider such as OpenEBS with Maya. Maybe if OpenEBS + Maya performs better it would allow us to get rid of GCSfuse as well. It seems Maya doesn't depend on iSCSI and relies on pure TCP so it would work in any environment.
Downside of such an approach is there is no way to get a signed URL or something like that. So we would have to find another way to upload tars to remote containers.
https://openebs.io/docs/introduction/usecases#building-scalable-websites-and-ml-pipelines
To support AWS we would need to:
infra/terraform/aws
infra/commands/{aws-commands}
AWSCloudContext
toCloudContext