numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs
https://numaflow.numaproj.io/
Apache License 2.0
1.1k stars 112 forks source link

Introduce easy volume mounts for UDF/UDSource/UDSink #1081

Open KeranYang opened 1 year ago

KeranYang commented 1 year ago

Problem Statement

Currently when a user wants to define a UDSource, couple of volumes are required to hold data like config map and secrets. An example is as below.

apiVersion: numaflow.numaproj.io/v1alpha1
kind: Pipeline
metadata:
  name: nats-source-e2e
spec:
  vertices:
    - name: in
      scale:
        min: 2
      volumes:
        - name: my-config-mount
          configMap:
            name: nats-config-map
        - name: my-secret-mount
          secret:
            secretName: nats-auth-fake-token
      source:
        udsource:
          container:
            image: quay.io/numaio/numaflow-source/nats-source-go:v0.5.0
            volumeMounts:
              - name: my-config-mount
                mountPath: /etc/config
              - name: my-secret-mount
                mountPath: /etc/secrets/nats-auth-fake-token
    - name: out
      sink:
        log: {}
  edges:
    - from: in
      to: out

The problem is that user needs to ensure the name of the volume and the corresponding name of volumeMount are exactly same. e.g. my-config-mount appears twice in the template above.

A better user experience would be to only specify volume mount name once. Something like below:

apiVersion: numaflow.numaproj.io/v1alpha1
kind: Pipeline
metadata:
  name: nats-source-e2e
spec:
  vertices:
    - name: in
      scale:
        min: 2
      udSourceVolumes:
        - name: my-config-mount
          mountPath: /etc/config
          configMap:
            name: nats-config-map
        - name: my-secret-mount
          mountPath: /etc/secrets/nats-auth-fake-token
          secret:
            secretName: nats-auth-fake-token
      source:
        udsource:
          container:
            image: quay.io/numaio/numaflow-source/nats-source-go:v0.5.0
    - name: out
      sink:
        log: {}
  edges:
    - from: in
      to: out

This applies to all UD artifacts.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

vigith commented 1 year ago

let's not call it udSourceVolumes, rather udVolumes and we can generalize and attach it anywhere.

whynowy commented 1 year ago

Can we go with Kubernetes native Volume definition, and we default the mount path?

vigith commented 1 year ago

Can we go with Kubernetes native Volume definition, and we default the mount path?

yes, but we do not want to conflict with the top-level name (volumes vs. udvolumes), right?

whynowy commented 1 year ago

Can we go with Kubernetes native Volume definition, and we default the mount path?

yes, but we do not want to conflict with the top-level name (volumes vs. udvolumes), right?

Correct. What I meant was, use the native Volume struct for the slice, instead of building our own. The example yaml obviously is doing something with our own struct.

vigith commented 1 year ago

Correct. What I meant was, use the native Volume struct for the slice, instead of building our own. The example yaml obviously is doing something with our own struct.

absolutely

KeranYang commented 1 year ago

Can we go with Kubernetes native Volume definition, and we default the mount path?

Sry correct me is I am wrong. Why can we default the mount path? This is not for built-in source hence the platform is not supposed to know where the secrets are mounted.

whynowy commented 1 year ago

Can we go with Kubernetes native Volume definition, and we default the mount path?

Sry correct me is I am wrong. Why can we default the mount path? This is not for built-in source hence the platform is not supposed to know where the secrets are mounted.

For example, mount them to /var/numaflow/vols/{volume-name}?

KeranYang commented 1 year ago

For example, mount them to /var/numaflow/vols/{volume-name}?

This requires a mutual agreement between UDF/UDSource/UDSink and the platform.

whynowy commented 1 year ago

For example, mount them to /var/numaflow/vols/{volume-name}?

This requires a mutual agreement between UDF/UDSource/UDSink and the platform.

What does it mean?

KeranYang commented 1 year ago

Synced offline with Derek. It's ok to maintain a mutual agreement that all user-defined volumes are mounted under a dedicated path like /var/numaflow/vols/{volume-name}. We will figure out the exact naming as we start implementing this feature.