lightbend / cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
https://cloudflow.io
Apache License 2.0
321 stars 89 forks source link

Configure PVC to use for Spark and Flink applications #710

Closed RayRoestenburg closed 3 years ago

RayRoestenburg commented 4 years ago

WIP.

Right now the storage class used for persistent volumes is set during install of Cloudflow. The operator creates 1 or more persistent volume claims of 15G respectively for apps using Spark and / or Flink.

It would be better if users can specify their PVC in the config file (and remove the default PVC since it is impossible to choose a default that works for everyone). Once this is changed, the install does not need to know about a storage class, which makes the installation more modular.

UX change will be that an app that needs PVCs will not get deployed if there is no PVC specified. (maybe we can support this in both the blueprint and the config file as override)

franciscolopezsancho commented 4 years ago

I'm proposing as an input:

kubernetes.pods.pod {
     volumes {
        foo {
          pvc {
            name = myclaim
          }
        }
      }

      containers.container {
        volume-mounts {
          name = foo
          mountPath = "/var/www/html"
          readOnly = true
        }
}

and output:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: doesntmatter
      image: someimage
      volumeMounts:
      - mountPath: "/var/www/html"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: myclaim
RayRoestenburg commented 4 years ago

Looks good @franciscolopezsancho

franciscolopezsancho commented 4 years ago

Apart from what it has been suggested above how to mount preexisting PVCs into a Pod, there are two more thoughts I'd like to add regards setting storage class and PVCs for Spark and Flink.

  1. In the case the PVC doesn't exist but the user needs it I rather won't create a facade or interface in Cloudflow to deal with this. I would rather point the user in the right direction. A standard that they may already know and if not will be valuable for them in the future.

  2. Currently as @Ray points above, limit, request and storage class name come from the initial settings at installation.

--set cloudflow_operator.persistentStorageClass=nfs

and default application.conf

persistent-storage {
        resources {
          request = "15G"
          limit = "30G"
        }
        storageClassName = "nfs-client"
        storageClassName = ${?PERSISTENT_STORAGE_CLASS}
      }

this creates PVCs in Runner.scala for Spark and Flink by default that get mounted by their respective operator definition (not sure where yet)

val pvcSpec = PersistentVolumeClaim.Spec(
      accessModes = List(AccessMode.ReadWriteMany),
      volumeMode = Some(VolumeMode.Filesystem),
      resources = Some(
        Resource.Requirements(
          limits = Map(Resource.storage   -> ctx.persistentStorageSettings.resources.limit),
          requests = Map(Resource.storage -> ctx.persistentStorageSettings.resources.request)
        )
      ),
      storageClassName = Some(ctx.persistentStorageSettings.storageClassName),
      selector = None
    )

Seems like a good idea to unify both approaches to allow a user to set PVCs in Flink and Spark to be used as their storage for their operations. We can set this as default in application.conf or some other default conf to allow a user to override it through '--conf ...'

cloudflow.runtimes.spark.kubernetes.pods.pod {
  volumes {
      base {
        pvc {
            name = myclaim
            requests.storage= 10GB
            requests.limit = 100GB
            storage-class = "nfs"
        }
      }
  }
  containers.container {
        volume-mounts {
          name = base
          mountPath = "/mnt/spark/data"
          readOnly = false
  }
}
RayRoestenburg commented 3 years ago

The spark and flink streamlets only expect a directory, respectively /mnt/spark/storage and /mnt/flink/storage to exist / to be mounted. (access mode must be RWM and volume mode must be Filesystem). (the paths is somewhat hardcoded in flink streamlet default flink config and the spark streamlet takes a storage.mountPath config which specifies this path, which is hardcoded to /mnt/spark/storage in Runner.) So the only thing we need to do here is check if a volume with that path is mounted on the Spark/Flink streamlet (and we could possibly check if it was mounted with a pvc). If there is no mount for that path, the cli can guide the user to add one. otherwise it is good to go and everything will work as expected.

RayRoestenburg commented 3 years ago

We can put an example conf file in examples, as well as a yaml file for the PVC and then explain how to create the PVC and then execute deploy with --conf default-persistent-volume.conf

RayRoestenburg commented 3 years ago

If for some reason we do want to support auto-creation / deletion of pvcs, it would need to be done under a separate section, so that streamlets can reuse it. The mounts and volume mounts sections would remain the same, and could refer to pvcs defined like this:

cloudflow.pvcs {
  myclaim {
    requests.storage= 10GB
    requests.limit = 100GB
    storage-class = "nfs"
  }
}

These pvcs would be created in the namespace where the application is created. It's questionable though how much easier this is than just creating the pvcs in a yaml file once.

franciscolopezsancho commented 3 years ago

@RayRoestenburg, as previously discussed this is what I will implement.

As a user I will be able to map an existing PVC to a Flink or Spark container through two means:

  1. flag: kubectl cloudflow deploy --streamlet-storage-default (or --ssd)
  2. configuration: kubectl cloudflow deploy --conf xyz.conf where the conf shall have to mount in /mnt/[runtime]/storage
    cloudflow.runtimes.[runtime].kubernetes.pods.pod {
    volumes {
        foo {
            pvc {
                name = myclaim
                read-only = false
            }
        }
    }
    containers.container {
        volume-mounts {
            foo {
                mount-path = "/mnt/[runtime]/storage"
                read-only = false
            }
        }
    }
    }
  3. will find the user runtimes in the CR deployed and add automatically a configuration for each of them as number 2 proposal shows. This configuration will map to claims with the name cloudflow-flink-pvc and/or cloudflow-spark-pvc

The basic difference between 1 and 2 apart from the automation is the ability in 2. to use claims with different names than cloudflow-[runtime]-pvc. I both cases, is required that the PVC(s) exists in the namespace with the name of the app before the deployment can get through.

In any of both cases if the PVC(s) don't exist the command will throw an error stating the necessity of the PVC(s) existence.

RayRoestenburg commented 3 years ago

Sounds good @franciscolopezsancho let's see how this will work. We might be able to make it easier still, but this is a good way to improve the situation, adding a little bit on top of what we have.

michael-read commented 3 years ago

What version are these changes going to be incorporated?

RayRoestenburg commented 3 years ago

The next one, and we want to start releasing things a lot faster, so once this is tested and integrated in integration tests, it can be released, as soon as next week.

franciscolopezsancho commented 3 years ago

I've changed my mind and I will add PVC conf by default and will not add it if the user adds its own. For instance, if they define in the conf a PVC for Spark It will just add the Flink one. In case the app has Flink runtimes of course. This configuration is available at runtime scope, not Streamlet.

franciscolopezsancho commented 3 years ago

merged in https://github.com/lightbend/cloudflow/pull/841