Closed RayRoestenburg closed 3 years ago
I'm proposing as an input:
kubernetes.pods.pod {
volumes {
foo {
pvc {
name = myclaim
}
}
}
containers.container {
volume-mounts {
name = foo
mountPath = "/var/www/html"
readOnly = true
}
}
and output:
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: doesntmatter
image: someimage
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: myclaim
Looks good @franciscolopezsancho
Apart from what it has been suggested above how to mount preexisting PVCs into a Pod, there are two more thoughts I'd like to add regards setting storage class and PVCs for Spark and Flink.
In the case the PVC doesn't exist but the user needs it I rather won't create a facade or interface in Cloudflow to deal with this. I would rather point the user in the right direction. A standard that they may already know and if not will be valuable for them in the future.
Currently as @Ray points above, limit, request and storage class name come from the initial settings at installation.
--set cloudflow_operator.persistentStorageClass=nfs
and default application.conf
persistent-storage {
resources {
request = "15G"
limit = "30G"
}
storageClassName = "nfs-client"
storageClassName = ${?PERSISTENT_STORAGE_CLASS}
}
this creates PVCs in Runner.scala for Spark and Flink by default that get mounted by their respective operator definition (not sure where yet)
val pvcSpec = PersistentVolumeClaim.Spec(
accessModes = List(AccessMode.ReadWriteMany),
volumeMode = Some(VolumeMode.Filesystem),
resources = Some(
Resource.Requirements(
limits = Map(Resource.storage -> ctx.persistentStorageSettings.resources.limit),
requests = Map(Resource.storage -> ctx.persistentStorageSettings.resources.request)
)
),
storageClassName = Some(ctx.persistentStorageSettings.storageClassName),
selector = None
)
Seems like a good idea to unify both approaches to allow a user to set PVCs in Flink and Spark to be used as their storage for their operations. We can set this as default in application.conf or some other default conf to allow a user to override it through '--conf ...'
cloudflow.runtimes.spark.kubernetes.pods.pod {
volumes {
base {
pvc {
name = myclaim
requests.storage= 10GB
requests.limit = 100GB
storage-class = "nfs"
}
}
}
containers.container {
volume-mounts {
name = base
mountPath = "/mnt/spark/data"
readOnly = false
}
}
The spark and flink streamlets only expect a directory, respectively /mnt/spark/storage
and /mnt/flink/storage
to exist / to be mounted. (access mode must be RWM and volume mode must be Filesystem).
(the paths is somewhat hardcoded in flink streamlet default flink config and the spark streamlet takes a storage.mountPath
config which specifies this path, which is hardcoded to /mnt/spark/storage
in Runner.)
So the only thing we need to do here is check if a volume with that path is mounted on the Spark/Flink streamlet (and we could possibly check if it was mounted with a pvc). If there is no mount for that path, the cli can guide the user to add one. otherwise it is good to go and everything will work as expected.
We can put an example conf file in examples, as well as a yaml file for the PVC and then explain how to create the PVC and then execute deploy with --conf default-persistent-volume.conf
If for some reason we do want to support auto-creation / deletion of pvcs, it would need to be done under a separate section, so that streamlets can reuse it. The mounts and volume mounts sections would remain the same, and could refer to pvcs defined like this:
cloudflow.pvcs {
myclaim {
requests.storage= 10GB
requests.limit = 100GB
storage-class = "nfs"
}
}
These pvcs would be created in the namespace where the application is created. It's questionable though how much easier this is than just creating the pvcs in a yaml file once.
@RayRoestenburg, as previously discussed this is what I will implement.
As a user I will be able to map an existing PVC to a Flink or Spark container through two means:
kubectl cloudflow deploy --streamlet-storage-default
(or --ssd)kubectl cloudflow deploy --conf xyz.conf
where the conf shall have to mount in /mnt/[runtime]/storage
cloudflow.runtimes.[runtime].kubernetes.pods.pod {
volumes {
foo {
pvc {
name = myclaim
read-only = false
}
}
}
containers.container {
volume-mounts {
foo {
mount-path = "/mnt/[runtime]/storage"
read-only = false
}
}
}
}
cloudflow-flink-pvc
and/or cloudflow-spark-pvc
The basic difference between 1 and 2 apart from the automation is the ability in 2. to use claims with different names than cloudflow-[runtime]-pvc
. I both cases, is required that the PVC(s) exists in the namespace with the name of the app before the deployment can get through.
In any of both cases if the PVC(s) don't exist the command will throw an error stating the necessity of the PVC(s) existence.
Sounds good @franciscolopezsancho let's see how this will work. We might be able to make it easier still, but this is a good way to improve the situation, adding a little bit on top of what we have.
What version are these changes going to be incorporated?
The next one, and we want to start releasing things a lot faster, so once this is tested and integrated in integration tests, it can be released, as soon as next week.
I've changed my mind and I will add PVC conf by default and will not add it if the user adds its own. For instance, if they define in the conf a PVC for Spark It will just add the Flink one. In case the app has Flink runtimes of course. This configuration is available at runtime scope, not Streamlet.
WIP.
Right now the storage class used for persistent volumes is set during install of Cloudflow. The operator creates 1 or more persistent volume claims of 15G respectively for apps using Spark and / or Flink.
It would be better if users can specify their PVC in the config file (and remove the default PVC since it is impossible to choose a default that works for everyone). Once this is changed, the install does not need to know about a storage class, which makes the installation more modular.
UX change will be that an app that needs PVCs will not get deployed if there is no PVC specified. (maybe we can support this in both the blueprint and the config file as override)