Closed consideRatio closed 4 years ago
I think this might not be a problem, and if it is, switching to SSD Filestore might be an easier solution. If you mount NFS once per node, I think you can then think of 100MB/s as per-node, rather than per-user, since it's gonna get cached locally.
I want to ensure that the data we provide in ~/data is able to handle the load of having hundreds of simultaneous users accessing it. This could be an issue, but it may not be an issue. I don't know that, but I want to ensure it doesn't become an issue.
Due to this, I plan to provide a cached version of the NFS servers to the participants under ~/data. Then, they will read something from a node-local disk instead than from the NFS server, which will help us avoid the risk of overloading the NFS server if for example we would need to read too much data at the same time.
/data
in read only mode, and rsync copy all content to a hostPath volume every minute.rsync -a
which implies that we will retain the permissions of what we copy, which means that we should ensure we do the permission fix before we copy the content./nh/data
if the user is a participant in a read only mode, and mount as NFS if its an instructor.Motivation
The NFS server we use is a Google Cloud managed NFS server service called Filestore. We have a BASIC_HDD Filestore instance with a size of 1TB. According to this documentation we can expect a throughput of 100MB/sec. That means that if we have 100 participants that wants to read 1GB, it will take 10 second per participant, which results in 1000 seconds of wait which is too much.
Also, according to the documentation about throughput of persistent disks, its throughput scales with size and is about ~4 times higher with SSD per size. These are the numbers.