knative / eventing

Event-driven application platform for Kubernetes
https://knative.dev/docs/eventing
Apache License 2.0
1.41k stars 590 forks source link

Discovery of importers and how to create them #1550

Closed mikehelmick closed 4 years ago

mikehelmick commented 5 years ago

Problem In working through #1381 one of the key things that we have found when targeting the user experience is that discoverability is highly important. For motivations, we look to the Scenarios for Knative Eventing.

To recap the 3 scenarios describe there

  1. FaaS - directly connect an importer (specific instance) to a consumer

  2. Event-Driven - current implementation of Broker + Trigger

  3. Black-box integration - scenario 3 is a specialization of both scenarios 1 and 2. From a developer’s viewpoint, consuming events is just like scenario 2 (Event-Driven) in that if an event is available on my cluster, I am able to trigger off of it.

The current registry implementation is event type centric (the CRD is literally called EventType). This works really well for scenario 2 (on-cluster, event-driven scenario) and scenario 3 (black-box integration).

This works less well for scenario 1 (FaaS) because it makes discovery of events via source more difficult to build. This doesn’t mean that the existing registry is wrong our less useful. There is currently one entry in the registry per event type, this is actually a tuple of Source and Type.

We propose introducing a new importer registry.

This gives the overall registry 2 components working together to facilitate discovery and configuration of events. The importer registry will be responsible for informing a user (and tooling) of which producers they may solicit what events from and how to configure those importers.

When an importer is configured, the importer is responsible for populating the EventType appropriately.

As a concrete example, the importer registry tells me that I can get Google Cloud Storage “finalize” events by instantiating the GCSSource kind. If two GCSSource objects are created for finalize events, the finalize event type should only appear in the EventType registry once. The EventType should reflect which importers are providing that type (can be an annotations, status, etc. to be designed).

Persona: Event Consumer

Exit Criteria A CLI can be written that can programmatically discovery and configure importers.

Time Estimate (optional): 5d

vaikas commented 5 years ago

One additional benefit of the importer registry is that for some eventtypes you may need to specify different kinds of arguments. With the current CRD this is not easily doable because the schema does not (at least seem) to support selective arguments on the fields. For example, say an importer that has 2 events foo,bar and for foo you need to provide parameter X and for bar you need Y, I'm not aware of any way to specify that through the current mechanism.

nachocano commented 5 years ago

What if we could use a modified version of the current sources CRDs, thus there is no need to create any new resources?

Some of the current importers provide a list of event types as strings (e.g., the GithubImporter). I think if instead of being a list of strings, we can make each event an object, with certain common properties (ceType, ceSchema, etc.), which should be properly defined in the Importer Spec, as well as specific ones, in case the different events of a same importer need different properties (which I still have my doubts about). With this, we can avoid creating a new resource, and can leverage standard openAPI tooling.

Here is an example of what a GitHubImporter eventTypes might look like:

eventTypes:
  type: object
  properties:
    pullRequest:
      type: object
      properties:
        type:
          type: string
          pattern: "dev.knative.source.github.pull_request"
        schema:
          type: string
          pattern: "pull-request-schema"
    push:
      type: object
      properties:
        type:
          type: string
          pattern: "dev.knative.source.github.push"
        schema:
          type: string
          pattern: "push-schema"
      # additionalProperties:
        onlyForPush:
          type: string
          description: "Some push specific property"
      required:
        - onlyForPush
    fork:
      type: object
      properties:
        type:
          type: string
          pattern: "dev.knative.source.github.fork"
        schema:
          type: string
          pattern: "fork-schema"
      # additionalProperties:
        onlyForFork:
          type: string
          description: "Some fork specific property"
        anotherOneForFork:
          type: string
          description: "Some other one fork specific property"
      required:
        - onlyForFork
        - anotherOneForFork
    ...

Or maybe the GCSSource one:

eventTypes:
  type: object
  properties:
    finalize:
      type: object
      description: "Event triggered when an object is created or finalized"
      properties:
        type:
          type: string
          pattern: "com.google.storage.finalize"
        schema:
          type: string
          pattern: "https://cloud.google.com/cloudevents/schemas/gcs/finalize.json"
        specVersion:
          type: string
          pattern: "0.3"
        finalizeField:
          type: string
          description: "You will have to configure this extra thing for finalize"
      required:
        - finalizeField
    delete:
      type: object
      description: "Event triggered when an object is deleted"
      properties:
        type:
          type: string
          pattern: "com.google.storage.delete"
        schema:
          type: string
          pattern: "https://cloud.google.com/cloudevents/schemas/gcs/delete.json"
        specVersion:
          type: string
          pattern: "0.3"
    archive:
      type: object
      description: "Event triggered when an object is archived"
      properties:
        type:
          type: string
          pattern: "com.google.storage.archive"
        schema:
          type: string
          pattern: "https://cloud.google.com/cloudevents/schemas/gcs/archive.json"
        specVersion:
          type: string
          pattern: "0.3"
        justForArchive:
          type: string
          description: "You will have to configure this extra thing for archive"
      required:
        - justForArchive
    metadataUpdate:
      type: object
      description: "Event triggered when an object's metadata is updated"
      properties:
        type:
          type: string
          pattern: "com.google.storage.metadataUpdate"
        schema:
          type: string
          pattern: "https://cloud.google.com/cloudevents/schemas/gcs/metadataUpdate.json"
        specVersion:
          type: string
          pattern: "0.3"
        updateSpecific:
          type: string
          description: "You will have to configure this extra thing"
        someUpdateOptionalField:
          type: string
          description: "no need to configure this stuff"
      required:
        - updateSpecific

Here is an example of a potential GCSSource CR...

apiVersion: sources.aikas.org/v1alpha1
kind: GCSSource
metadata:
  name: gcs-source
spec:
  googleCloudProject: my-project
  gcsCredsSecret:  
    name: my-key
    key: key.json
  bucket: my-bucket
  eventTypes:
    metadataUpdate:
      updateSpecific: my-update-specific-property
  sink:
    apiVersion: serving.knative.dev/v1alpha1
    kind: Service
    name: event-display

We do not expect users to configure type nor schema nor specVersion in any of the cases, they will be defaulted with a mutating admission webhook. In case the user tries to configure them with some weird value, the validation will complaint due to the exact match pattern specified in the CRD.

By listing the sources with kubectl get crds -l eventing.knative.dev/source And then describing them, or with some fancier jsonpath or go-templating thingy, we could easily understand the types that the importer provides, together with their schemas.

For example, only the types: kubectl get crds githubsources.sources.eventing.knative.dev -o=jsonpath='{..type.pattern}' would output: dev.knative.source.github.fork dev.knative.source.github.pull_request dev.knative.source.github.push

grantr commented 5 years ago

One additional benefit of the importer registry is that for some eventtypes you may need to specify different kinds of arguments.

The only example I can think of right now is GitHub. Depending on whether you're subscribing to an org-level event or a repo-level event, the repo argument may be invalid (membership) or required (push).[1] This seems like an edge case rather than a normal case, and supporting multiple parameter signatures per registry entry is a lot of complexity for an edge case.

Thought experiment: What is the CloudEvents source attribute value for each GitHub event type? I imagine for org events it's github.com/org and for repo events it's github.com/org/repo. The CloudEvents definition of source is ambiguous but I think in practice users will understand it as "the context I subscribe[d] to". IMO the user may reasonably assume that org and repo events are emitted by two different sources because the source syntaxes[2] are different.

What if "GitHub the service" actually had two entries in the registry, one for org events and one for repo events? Then each entry would have one parameter schema. The schema could be expressed in the CRD spec as @nachocano proposes (there would be two CRDs for subscribing to the two sources) or in a separate registry object (two registry entry objects, 1 or 2 CRDs), but in both cases there is one schema per entry.

I suspect this rule of "one parameter schema per source syntax" holds for all sources, but I don't have any other real examples. A contrived example is a GCPImporter CRD that can import events from all of GCP.

I prefer to think of a source registry versus an importer registry, and source discovery versus importer discovery. We created the name "importer" to help contributors distinguish the event ingress from the event source, but I don't think users care about that distinction. The user goal is "I want to subscribe to a GitHub org event" rather than "I want to subscribe to an event from a GitHubImporter object". I think we'll be in a better place if users (trigger creators) consider the subscribable context to be the abstract source rather than the concrete importer object, with the registry serving as an enumeration of the possible sources rather than the possible importers.

1: GitHub makes it possible to subscribe to repo-level events for all repos in an org, but this is a convenience feature that doesn't change the conceptual model for the Knative Eventing user. 2: I think this is the correct usage of the word syntax, but if there's a better word for this let me know. I also considered "template".

nachocano commented 5 years ago

I prefer to think of a source registry versus an importer registry, and source discovery versus importer discovery.

+1 to source registry, but I have no strong opinions otherwise.

mikehelmick commented 5 years ago

source registry is a fine way to think of it. The idea behind this is that users have a way to discover what event sources are available and how to configure them. Generally this will involve creating an importer.

I do wonder if we want to place a restriction that all importers must be backed by a CRD in order to be discoverable. Is that too heavyweight and are the other alternatives we should consider?

grantr commented 5 years ago

We may also want to keep in mind that the validation spec for CRDs will soon change to a subset of OpenAPI: https://kubernetes.io/blog/2019/06/20/crd-structural-schema/. Will that subset support what we want to do here?

In general I'm a bit concerned about locking our registry API to the CRD API. We already know CRD validation is switching to be per-version shortly. Do we want that? Is CRD versioning the same versioning we want with a source/importer registry?

nachocano commented 5 years ago

Here is a working prototype of a GCSImporter using the CRD-based approach introduced above. Note that the documentation is outdated.

Example usage:

Discover the sources/importers

kubectl get crds -l eventing.knative.dev/source=true

Discover the event types that GCS can emit and their schema

kubectl get crds gcssources.sources.aikas.org -oyaml

Create a GCS Importer

apiVersion: sources.aikas.org/v1alpha1
kind: GCSSource
metadata:
  name: gcs-source
spec:
  googleCloudProject: my-project
  gcsCredsSecret:  
    name: google-cloud-key
    key: key.json
  bucket: my-bucket
  eventTypes:
    metadataUpdate: {}
    archive: {}
  sink:
    apiVersion: eventing.knative.dev/v1alpha1
    kind: Broker
    name: default

// TODO populate the EventType registry with the event types the broker will be able to provide as part of this instantiation.

Harwayne commented 5 years ago

This is the rough outline of a design for the Importer Registry. It is similar in some ways to @nachocano's design in https://github.com/knative/eventing/issues/1550#issuecomment-515604913. I believe that proposal is superior, but wanted to write this down both to have a record of it and in case others see more potential in it.

Create a new CRD, ImporterRegistration. The installation of any Importer CRD would include its corresponding ImporterRegistration CO.

apiVersion: eventing.knative.dev/v1alpha1
kind: ImporterRegistration
metadata:
  name: github-importer
spec:
  apiVersion: sources.eventing.knative.dev/v1alpha1
  kind: GitHubImporter
  types:
    push:
      openAPIV3SchemaMergePatch:
        properties:
          spec:
            required:
              - onlyForPush
            # This field from the CRD is not used for this type, do not allow it.
            onlyForFork: null 
    fork:
      openAPIV3SchemaMergePatch:
        properties:
          spec:
            required:
              - onlyForFork
              - anotherOneForFork
            anotherOneForFork:
              # Altered verification of the field.
              minLength: 4
    ...

Conceptually, the spec includes the concrete importer to create (GitHubImporter in sources.eventing.knative.dev/v1alpha1) and the CloudEvent types that the importer can be configured to import (push and fork). Each event type is allowed to provide an RFC 7386 JSON Merge Patch that is applied to the CRD's generic OpenAPIV3Schema. A large benefit of this is that tooling can easily generate the OpenAPI schema for a specific event type, allowing construction of UIs that request only the necessary information for the desired event type.

One advantage this has over the proposal in https://github.com/knative/eventing/issues/1550#issuecomment-515604913, is that the same field name can be used in distinct ways by distinct event types, because the patch for each can change its description and validation. However, this is likely a feature we do not want as it makes things more complicated for the user.

Another advantage is that it does not require the Importer to include a spec.events field, which is a map from CloudEvent type to object. The Importer author is free to setup their CRD however they want.


Overall, I think that the proposal in https://github.com/knative/eventing/issues/1550#issuecomment-515604913 is superior to this proposal. It removes the need for a distinct registration CO. While still allowing us to easily make UIs that interactively choose the parameters needed to instantiate an Importer. It is also easier to read through a single OpenAPI schema than it is to mentally apply patches from one OpenAPI schema to another.

mikehelmick commented 5 years ago

@nachocano - Can you also describe how this fits in with the candidate CLI

And coordinate with @n3wscott on #1554 - the solution to this issue and that issue must work together.

nachocano commented 5 years ago

I think a potential CLI might look as follows, considering the proposal in #1550 (comment)

Discover Sources: The ability to discover the sources of events is a prerequisite for being able to connect events from producers to consumers.

kn events sources list
gcs  --  This source allows you to bring Google Cloud Storage events into Knative
github  --  This source allows you to bring GitHub events into Knative
...

Discover Source's Event Types and how to Configure the Source: In order to configure delivery of events from one of the sources, a user needs to know what events the source can emit as well as how to configure such source.

kn events source gcs describe
description: This source allows you to bring Google Cloud Storage events into Knative

configuration:

google-cloud-project  -- ID of the google cloud project that contains the bucket
service-account-secret -- name of the secret containing a GCP service account key with permission to create a GCP notification.
bucket-name         -- GCS bucket to subscribe to.
target-service                -- Service where to send the notifications to.
 -- [optional] 
 object-name-prefix   -- prefix to only notify when objects match this prefix

event-types:

finalize  -- Event triggered when an object is created or finalized
   schema: 
      specversion: CloudEvents v0.3
      type: com.google.storage.finalize
      schemaurl: https://cloud.google.com/cloudevents/schemas/gcs/finalize.json
   configuration:
      finalize-field -- example of a particular field for finalize event type  

archive  -- Event triggered when an object is archived
    schema: 
       specversion: CloudEvents v0.3
       type: com.google.storage.archive
       schemaurl: https://cloud.google.com/cloudevents/schemas/gcs/archive.json
   configuration:
       just-for-archive  -- example of a particular field required for archive event type

delete    -- Event triggered when an object is deleted
    schema: 
        specversion: CloudEvents v0.3
        type: com.google.storage.delete
        schemaurl: https://cloud.google.com/cloudevents/schemas/gcs/delete.json

metadataUpdate  -- Event triggered when an object's metadata is updated
     schema:
        specversion: CloudEvents v0.3
        type: com.google.storage.metadataUpdate
        schemaurl: https://cloud.google.com/cloudevents/schemas/gcs/metadataUpdate.json
    configuration:
        update-specific  -- Update specific field 
         -- [optional]
             some-update-optional-field   -- Update optional field

Note: I think we should also use have an example section as part of the output of describe.

Create a GCS Source:

kn events source gcs create my-gcs-source \
--namespace default \
--config google-cloud-project=my-project,service-account-secret=my-secret,bucket-name=my-bucket,target-service=my-service \
--event-types finalize=finalize-field=my-finalize-field-value,delete,metadataUpdate=update-specific=my-update-specific-field-value,some-update-optional-field=my-update-optional-field \

The instantiation of this GCS source, only if pointed to a Broker (this is subject to change), should populate the EventType Registry, which is a more useful mechanism to discover what are the event types that have been already hooked up and are able to consume from a particular Broker.

vaikas commented 5 years ago

I'd prefer to extend the existing CRD model and not create additional registry for now. If we feel we need something like that in the future we can add it. If there are cases that @nachocano proposal doesn't cover, then I'd suggest we move forward with it. Main focus is to handle discovery from my point of view and I believe this case is handled by this proposal. As far as validation goes, I don't think the swagger doc is going to be sufficient alone for all the cases, and I believe what's proposed at the CRD spec level will be sufficient for the discovery use cases as well as the simple validation, but that the webhook will have to be involved in the more complex validation already.

nachocano commented 5 years ago

I do wonder if we want to place a restriction that all importers must be backed by a CRD in order to be discoverable. Is that too heavyweight and are the other alternatives we should consider?

By talking with @Harwayne, it seems that his AutoContainerSource magic would allow us to avoid writing a controller, but there is still need for CRDs, that would specify the supported event types with their schemas. For example, in his example K8sEventSource CRD we would have to add that.

nachocano commented 5 years ago

@n3wscott I'm wondering if we can use a similar mechanism to AutoContainerSource to make this Importer Registry work with the PullSubscription?

nachocano commented 5 years ago

@n3wscott made some suggestions to the CRD approach, which IMO make sense. Basically, that we should decouple importer registry from "filtering" on the source (i.e., from specifying what event types to subscribe to). In other words, it would be nice to have, as the GitHub importer currently has, an eventTypes array with the possible types (for filtering), and an internal eventTypes object with description of those types (for registry purposes). The latter is something that users will not need to specify when creating a source CO...

Going back to the GCS example, it might now look something like this (interesting parts are _eventTypes and eventTypes)

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  # name must match the spec fields below, and be in the form: <plural>.<group>
  name: gcssources.sources.aikas.org
  labels:
    eventing.knative.dev/source: "true"
spec:
  group: sources.aikas.org
  version: v1alpha1
  scope: Namespaced
  names:
    categories:
    - all
    - knative
    - eventing
    - sources
    plural: gcssources
    singular: gcssource
    kind: GCSSource
    shortNames:
    - gcs
  validation:
    openAPIV3Schema:
      properties:
        apiVersion:
          type: string
        kind:
          type: string
        metadata:
          type: object
        spec:
          properties:
            _eventTypes:
              type: object
              description: "Internal description of event types, users should not set this property"
              properties:
                finalize:
                  type: object
                  properties:
                    type:
                      type: string
                      pattern: "com.google.storage.finalize"
                    schema:
                      type: string
                      pattern: "https://cloud.google.com/cloudevents/schemas/gcs/finalize.json"
                delete:
                  type: object
                  properties:
                    type:
                      type: string
                      pattern: "com.google.storage.delete"
                    schema:
                      type: string
                      pattern: "https://cloud.google.com/cloudevents/schemas/gcs/delete.json"
                archive:
                  type: object
                  properties:
                    type:
                      type: string
                      pattern: "com.google.storage.archive"
                    schema:
                      type: string
                      pattern: "https://cloud.google.com/cloudevents/schemas/gcs/archive.json"
                metadataUpdate:
                  type: object
                  properties:
                    type:
                      type: string
                      pattern: "com.google.storage.metadataUpdate"
                    schema:
                      type: string
                      pattern: "https://cloud.google.com/cloudevents/schemas/gcs/metadataUpdate.json"
            eventTypes:
              type: array
              items:
                enum:
                  - finalize
                  - delete
                  - archive
                  - metadataUpdate
                type: string
              description: "The event types to subscribe to. If none is specified, we default to all of them"
            gcsCredsSecret:
              type: object
              description: "Credential to use for creating a GCP notification. Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys)."
            gcpCredsSecret:
              type: object
              description: "Optional credential to use for subscribing to the GCP PubSub topic. If omitted, uses gcsCredsSecret. Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys)."
            serviceAccountName:
              type: string
              description: "Service Account to run Receive Adapter as. If omitted, uses 'default'."
            googleCloudProject:
              type: string
              description: "Google Cloud Project ID to create the scheduler job in."
            bucket:
              type: string
              description: "GCS bucket to subscribe to. For example my-test-bucket"
            objectNamePrefix:
              type: string
              description: "Optional prefix to only notify when objects match this prefix."
            payloadFormat:
              type: string
              description: "Optional payload format. Either NONE or JSON_API_V1. If omitted, uses JSON_API_V1."
            sink:
              type: object
              description: "Where to sink the notifications to."
          required:
          - gcsCredsSecret
          - googleCloudProject
          - bucket
          - sink

A GCS CO would look something like this now:

apiVersion: sources.aikas.org/v1alpha1
kind: GCSSource
metadata:
  name: gcs-source-example
spec:
  googleCloudProject: my-project
  gcsCredsSecret:  
    name: my-key
    key: key.json
  bucket: my-bucket
  eventTypes:
    - metadataUpdate
    - archive
  sink:
    apiVersion: serving.knative.dev/v1alpha1
    kind: Service
    name: my-service

Working prototype here

@n3wscott does this address your comments? fyi @vaikas-google @Harwayne @grantr

BTW, as just pointed at by @Harwayne, this changes does not give us the flexibility to set different properties for different event types of the same importer, but I think that is a border case, and most of the time, they will need the same config. Thoughts?

n3wscott commented 5 years ago

I like this plan.

vaikas commented 5 years ago

so the EventTypes stays as is: https://github.com/vaikas-google/gcs/blob/master/pkg/apis/gcs/v1alpha1/types.go#L73

(Looks like I forgot to add the EventTypes to the openapi spec, oops)

And would the __eventTypes be added to the above file or only to the openapi spec? If only to the spec, that seems fine. I assume that will not cause any gried?

grantr commented 5 years ago

Can we call it _eventTypesSupported instead of _eventTypes? Might be less confusing to users (and contributors when they accidentally forget the _)

n3wscott commented 5 years ago

I like _eventTypesSupported or _eventTypesProduced.

-Scott

On Wed, Jul 31, 2019 at 2:20 PM Grant Rodgers notifications@github.com wrote:

Can we call it _eventTypesSupported instead of eventTypes? Might be less confusing to users (and contributors when they accidentally forget the )

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knative/eventing/issues/1550?email_source=notifications&email_token=AHWPD4H4WTRWP6Z2LUDUPULQCH62VA5CNFSM4IF6BDV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IS66Y#issuecomment-517025659, or mute the thread https://github.com/notifications/unsubscribe-auth/AHWPD4EAYBJQGNZGUM63IATQCH62VANCNFSM4IF6BDVQ .

nachocano commented 5 years ago

so the EventTypes stays as is: https://github.com/vaikas-google/gcs/blob/master/pkg/apis/gcs/v1alpha1/types.go#L73

(Looks like I forgot to add the EventTypes to the openapi spec, oops)

Yes, but I made them to be an enum. See here:

https://github.com/nachocano/gcs/blob/600e3f38cab7c9efe8b01fdb77aa31e97cd11d87/config/300-gcs.yaml#L74

And would the __eventTypes be added to the above file or only to the openapi spec? If only to the spec, that seems fine. I assume that will not cause any gried?

I added them in both, see here. But in the webhook I'm validating the internal ones are not populated. Didn't try not adding them in the .go file. If it works without having them there, I guess it's better.

nachocano commented 5 years ago

Can we call it _eventTypesSupported instead of _eventTypes? Might be less confusing to users (and contributors when they accidentally forget the _)

+1 to _eventTypesSupported

grantr commented 5 years ago

But in the webhook I'm validating the internal ones are not populated.

Will every importer (that wants to be in the registry) need to run a webhook to deny edits to these fields?

Didn't try not adding them in the .go file. If it works without having them there, I guess it's better.

I expect we'll need to read these values at some point in go code, but maybe it doesn't have to be the concrete importer type. Seems like it could be a duck type only.

vaikas commented 5 years ago

I would expect those (__eventTypesSupported) to be only for discovery and not used by the types. I could see us reading the CRD openapi for tooling, but I can't think (off the top of my head) when we'd want them to be in the objects Spec field, that seems a bit wonky.

nachocano commented 5 years ago

Will every importer (that wants to be in the registry) need to run a webhook to deny edits to these fields?

Hopefully not... Do you know if the pattern in the CRD is enforced without having a webhook? If a user wants to put something different than the pattern, it should fail.

I expect we'll need to read these values at some point in go code, but maybe it doesn't have to be the concrete importer type. Seems like it could be a duck type only.

Yes, if we need to read them in go it can end up being a CloudEventable duck maybe...

nachocano commented 5 years ago

I don't see a github webhook, and IIRC the enum values in eventTypes were being validated

nachocano commented 5 years ago

I would expect those (__eventTypesSupported) to be only for discovery and not used by the types. I could see us reading the CRD openapi for tooling, but I can't think (off the top of my head) when we'd want them to be in the objects Spec field, that seems a bit wonky.

I guess we might need to read them for extracting the schema and the type if we don't want them hardcoded when creating the entries in the EventType registry... I might be wrong

vaikas commented 5 years ago

Would you mind doing the experiment of removing it from the types.Spec? It feels wrong to me to have a field there that's only used to convey information to the user, leave it to the openapi spec. After reading a bit more, perhaps instead of adding them to the properties section of the schema, we could utilize the additionalProperties and add the __eventTypesSupported there (or maybe in that case they could just be eventTypesSupported since they wouldn't collide)? CRDs support the additionalProperties field: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/

@n3wscott ring any bells?

As far as adding them into the registry, I think it would be nice for a CO to actually emit in its status what it's actually been configured to emit and that Status should drive the registry. Then there could be one controller that could look at all the COs and populate the registry without each CRD having to stash that. Anyways, haven't thought that one through and conflating bit of issues, but...

n3wscott commented 5 years ago

actually emit in its status what it's actually been configured

I think that is nice.

On Wed, Jul 31, 2019 at 2:57 PM Ville Aikas notifications@github.com wrote:

Would you mind doing the experiment of removing it from the types.Spec? It feels wrong to me to have a field there that's only used to convey information to the user, leave it to the openapi spec. After reading a bit more, perhaps instead of adding them to the properties section of the schema, we could utilize the additionalProperties and add the __eventTypesSupported there (or maybe in that case they could just be eventTypesSupported since they wouldn't collide)? CRDs support the additionalProperties field:

https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/

@n3wscott https://github.com/n3wscott ring any bells?

As far as adding them into the registry, I think it would be nice for a CO to actually emit in its status what it's actually been configured to emit and that Status should drive the registry. Then there could be one controller that could look at all the COs and populate the registry without each CRD having to stash that. Anyways, haven't thought that one through and conflating bit of issues, but...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knative/eventing/issues/1550?email_source=notifications&email_token=AHWPD4CT6UZNNFA56X4Y4SDQCIDERA5CNFSM4IF6BDV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IV4SA#issuecomment-517037640, or mute the thread https://github.com/notifications/unsubscribe-auth/AHWPD4DCXRWDFUDT37YWBHLQCIDERANCNFSM4IF6BDVQ .

nachocano commented 5 years ago

Would you mind doing the experiment of removing it from the types.Spec? It feels wrong to me to have a field there that's only used to convey information to the user, leave it to the openapi spec. After reading a bit more, perhaps instead of adding them to the properties section of the schema, we could utilize the additionalProperties and add the __eventTypesSupported there (or maybe in that case they could just be eventTypesSupported since they wouldn't collide)? CRDs support the additionalProperties field: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/

I removed them from the spec and it works... And if you want to set them in yaml to some invalid value, it gets validated without a webhook...

spec._eventTypesSupported.archive.type in body should match 'com.google.storage.archive'

It might be better to have it as an additionalProperties. I think I tried that and you couldn't have both of them specified (properties and additionalProperties) under spec, but I can try again

nachocano commented 5 years ago

As far as adding them into the registry, I think it would be nice for a CO to actually emit in its status what it's actually been configured to emit and that Status should drive the registry. Then there could be one controller that could look at all the COs and populate the registry without each CRD having to stash that. Anyways, haven't thought that one through and conflating bit of issues, but...

Yes, @n3wscott mentioned this, and it makes sense. But even in that case, if we don't want the type and schema hardcoded in go, we will need to read those values.

vaikas commented 5 years ago

Ok, so leaving them out of the spec (types.go file) and adding them to the config/CRD.yaml definition is cool? I think I'm fine with that esp. given the documentation there, I just don't want them in the code if possible.

as far as hardcoding, I don't quite understand. Who was setting those values in the actual code? I can see you defining the types.go spec.__eventTypes, but where was the population of that map happening?

grantr commented 5 years ago

:+1: to putting it in additionalProperties. That will remain supported in CRD structural schemas (https://kubernetes.io/blog/2019/06/20/crd-structural-schema/)

nachocano commented 5 years ago

Ok, so leaving them out of the spec (types.go file) and adding them to the config/CRD.yaml definition is cool? I think I'm fine with that esp. given the documentation there, I just don't want them in the code if possible.

Yes, agree. Let's not put them in types.go..

as far as hardcoding, I don't quite understand. Who was setting those values in the actual code? I can see you defining the types.go spec.__eventTypes, but where was the population of that map happening?

In GCS we were never creating EventTypes in the registry... By hardcoding I mean for example here

nachocano commented 5 years ago

to putting it in additionalProperties. That will remain supported in CRD structural schemas (https://kubernetes.io/blog/2019/06/20/crd-structural-schema/)

Cannot put properties and additionalProperties under the same field, spec in this case. I recall I tried that.

The CustomResourceDefinition "gcssources.sources.aikas.org" is invalid: spec.validation.openAPIV3Schema.properties[spec].additionalProperties: Forbidden: additionalProperties and properties are mutual exclusive

Having something else than spec and status doesn't comply with k8s, right?

grantr commented 5 years ago

Having something else than spec and status doesn't comply with k8s, right?

Kubernetes doesn't mandate spec and status; it's just a convention. Several core objects have neither.

nachocano commented 5 years ago

Having something else than spec and status doesn't comply with k8s, right?

Kubernetes doesn't mandate spec and status; it's just a convention. Several core objects have neither.

OK, in that case, if we still want them separated from spec.eventTypes, we can add some other property... But I'm OK with leaving _eventTypesSupported under spec.

daisy-ycguo commented 5 years ago

My thought after reading this discussion is that we definitely need a CLI to easily manipulate these concepts (importers, eventTypes), other than introducing new CRDs. Eventing CLI can be a higher abstraction for users and a wrapper of low level implementations. If new CRDs is necessary for new features and new use cases, of course we should add them. If only to improve the user experiences, I believe CLI is a better choice. If we only depends on CRD and kubectl, things are complicated.

akashrv commented 5 years ago

/milestone v0.9.0

nachocano commented 5 years ago

/assign

nachocano commented 5 years ago

Talking more with @n3wscott, and following up on a comment made by @vaikas-google and @grantr here, it seems that we should have a separate attribute for registry-related things in the CRD (which should be specified in the importer spec), so as not to confuse the user by putting it in the spec section...

The GCS source would then look something as follows:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  # name must match the spec fields below, and be in the form: <plural>.<group>
  name: gcssources.sources.aikas.org
  labels:
    eventing.knative.dev/source: "true"
spec:
  group: sources.aikas.org
  version: v1alpha1
  scope: Namespaced
  names:
    categories:
    - all
    - knative
    - eventing
    - sources
    plural: gcssources
    singular: gcssource
    kind: GCSSource
    shortNames:
    - gcs
  validation:
    openAPIV3Schema:
      properties:
        registry:
          type: object
          description: "Internal information for registry purposes. Users should not set this property"
          properties:
            eventTypes:
              type: object
              properties:
                finalize:
                  type: object
                  properties:
                    type:
                      type: string
                      pattern: "google.storage.object.finalize"
                    schema:
                      type: string
                      pattern: "https://cloud.google.com/cloudevents/schemas/storage/finalize.json"
                delete:
                  type: object
                  properties:
                    type:
                      type: string
                      pattern: "google.storage.object.delete"
                    schema:
                      type: string
                      pattern: "https://cloud.google.com/cloudevents/schemas/storage/delete.json"
                archive:
                  type: object
                  properties:
                    type:
                      type: string
                      pattern: "google.storage.object.archive"
                    schema:
                      type: string
                      pattern: "https://cloud.google.com/cloudevents/schemas/storage/archive.json"
                metadataUpdate:
                  type: object
                  properties:
                    type:
                      type: string
                      pattern: "google.storage.object.metadataUpdate"
                    schema:
                      type: string
                      pattern: "https://cloud.google.com/cloudevents/schemas/storage/metadataUpdate.json"   
        spec:
          properties:
            eventTypes:
              type: array
              items:
                enum:
                  - finalize
                  - delete
                  - archive
                  - metadataUpdate
                type: string
              description: "The event types to subscribe to. If none is specified, we default to all of them"
            gcsCredsSecret:
              type: object
              description: "Credential to use for creating a GCP notification. Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys)."
            gcpCredsSecret:
              type: object
              description: "Optional credential to use for subscribing to the GCP PubSub topic. If omitted, uses gcsCredsSecret. Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys)."
            serviceAccountName:
              type: string
              description: "Service Account to run Receive Adapter as. If omitted, uses 'default'."
            googleCloudProject:
              type: string
              description: "Google Cloud Project ID to create the scheduler job in."
            bucket:
              type: string
              description: "GCS bucket to subscribe to. For example my-test-bucket"
            objectNamePrefix:
              type: string
              description: "Optional prefix to only notify when objects match this prefix."
            payloadFormat:
              type: string
              description: "Optional payload format. Either NONE or JSON_API_V1. If omitted, uses JSON_API_V1."
            sink:
              type: object
              description: "Where to sink the notifications to."
          required:
          - gcsCredsSecret
          - googleCloudProject
          - bucket
          - sink

Does that sound right for everyone?

vaikas commented 5 years ago

validation: openAPIV3Schema: properties: apiVersion: type: string kind: type: string metadata: type: object registry: type: object

Do we need the k8s typemeta stuff here? otherwise, LGTM. thanks for working on this!

nachocano commented 5 years ago

validation: openAPIV3Schema: properties: apiVersion: type: string kind: type: string metadata: type: object registry: type: object

Do we need the k8s typemeta stuff here? otherwise, LGTM. thanks for working on this!

I think is not needed. It was just a copy/paste from the current one. Safe to remove them. Thanks!

lionelvillard commented 5 years ago

LGTM. My only concern is this solution might not be forward-compatible if the CRD controller decides to validate validation and rejects additional top-level properties. Unlikely?

rhuss commented 5 years ago

I've read through the issue, but I couldn't find the answer to the following question (sorry for my ignorance if hidden somewhere ;-):

From a CLI perspective who would be responsible for mapping flat key-value options given on the command line to a nested and structure schema for the CR to create ? The client can discover the openapi schema from the registry, but is the mapping also done by the importer registry or is this the duty of the client itself (e.g. by establishing some kind of jsonpath semantics for the keys) ?

How would that work for complex importer CRs, like the one for the Camel importer ? (like the example in https://github.com/knative/eventing-contrib/blob/master/camel/source/samples/source_camel_k.yaml which holds a DSL description in its content: field).

nachocano commented 5 years ago

From a CLI perspective who would be responsible for mapping flat key-value options given on the command line to a nested and structure schema for the CR to create ? The client can discover the openapi schema from the registry, but is the mapping also done by the importer registry or is this the duty of the client itself (e.g. by establishing some kind of jsonpath semantics for the keys) ?

I think the CLI should be responsible for the mapping. There is a proposed CLI experience here. It's slightly outdated with the latest changes in this thread, but the overall idea is kind of the same.

How would that work for complex importer CRs, like the one for the Camel importer ? (like the example in https://github.com/knative/eventing-contrib/blob/master/camel/source/samples/source_camel_k.yaml which holds a DSL description in its content: field).

I'm not very aware of the Camel importer. I couldn't find a place in the code where it's actually emitting CloudEvents. Is it? If it is, then one thing to do would be to add in the Camel importer CRD the list of event types it can emit. And same as above, if we properly define the fields in the CRD, what is required, and what not, descriptions of the fields, etc., then I think the CLI should be able to read the openAPIV3Validation schema and do the mapping... Does that make sense?

rhuss commented 5 years ago

I think the CLI should be responsible for the mapping. There is a proposed CLI experience here. It's slightly outdated with the latest changes in this thread, but the overall idea is kind of the same.

From the example given it looks to me, this works only nicely for top-level elements of a simple CR. How would you dive into deeper structures, supporting complex CR parts like object and arrays? I'm not sure how this could be kept in a general fashion and still provide a good user experience. Using typeless, comma-separated properties which could grow endlessly is typically not such a good thing from a UX POV.

I believe it would be very helpful if the registry would also contain information about how CLI options (or property keys) should be named and how they map to deeper CR fields. That's the only way how you can provide a satisfactory UX as there will be no 'generic' mapping available which is easy to understand. Also, this would help to give the user a 'flat' key-value experience for the CLI, and helps in creating help messages.

rhuss commented 5 years ago

I'm not very aware of the Camel importer. I couldn't find a place in the code where it's actually emitting CloudEvents. Is it? If it is, then one thing to do would be to add in the Camel importer CRD the list of event types it can emit.

/cc @nicolaferraro

rhuss commented 5 years ago

And same as above, if we properly define the fields in the CRD, what is required, and what not, descriptions of the fields, etc., then I think the CLI should be able to read the openAPIV3Validation schema and do the mapping... Does that make sense?

I think it's not a big issue for scalar data values which are short (e.g not big DSL like files). A json path like mapping could be used in this case. However, for values with a lot of content or CR using complex data structures like arrays, or deep CRs schemas, this might be not so user-friendly anymore, were people typically expect short, typed options (e.g. --gcsCredential .... --topic ...) which are also shown in a help page.

nachocano commented 5 years ago

From the example given it looks to me, this works only nicely for top-level elements of a simple CR. How would you dive into deeper structures, supporting complex CR parts like object and arrays? I'm not sure how this could be kept in a general fashion and still provide a good user experience. Using typeless, comma-separated properties which could grow endlessly is typically not such a good thing from a UX POV.

I believe it would be very helpful if the registry would also contain information about how CLI options (or property keys) should be named and how they map to deeper CR fields. That's the only way how you can provide a satisfactory UX as there will be no 'generic' mapping available which is easy to understand. Also, this would help to give the user a 'flat' key-value experience for the CLI, and helps in creating help messages.

The main idea of the importer registry as it is right now is to enable the discovery of the event types that the importers can provide, thus that additional registry property in the CRD. There will be an Importer Spec, anytime soon (cc @n3wscott), which will specify what a "well-behaved" importer should look like, e.g., have this registry property with eventTypes that have CloudEvents type and schema defined, etc, etc... I do think that if there are CLI guidelines/requirements we should follow, it would be nice if we can capture them in this spec.

But I think you are right, it might be hard to make a CLI out of these CRDs to create the actual importer object. Can you add an example of how do you think the GCS CRD example can be improved to make the CLI experience better? Or maybe some other more complicated one?

Thanks for the comment BTW, it's very useful...

rhuss commented 5 years ago

Thanks for response, really very helpful. However, I have a bit of a hard time to wrap my head around this GCS CRD example.

My first assumption this is the CRD that a client can lookup from the importer registry. Is this the case ? Or does the client lookups an instance of this CRD (a CR) ?

If it is the CRD, than IIUR a CR created based on this schema can carry a top-level spec: and registry: section. So for instantiation of this CR (which I suppose is the responsibility of the client) both fields can be filled, or ? What would a client put into the registry: field of the CR to create ? Or is this registry: field only there that it can be evaluated from the schema but is never supposed to be used in a CR ? (e.g the value to used by the client is actually the sample pattern in the schema)

I guess, I just haven't grokked the whole way how a client should interact with the importer registry to obtain enough information to create a CR for the importer to create.

However, regardless what the following meta-data would be invaluable for a client:

nachocano commented 5 years ago

Thanks for response, really very helpful. However, I have a bit of a hard time to wrap my head around this GCS CRD example.

My first assumption this is the CRD that a client can lookup from the importer registry. Is this the case ? Or does the client lookups an instance of this CRD (a CR) ?

The client should lookup for the CRDs, not a particular CR...

If it is the CRD, than IIUR a CR created based on this schema can carry a top-level spec: and registry: section. So for instantiation of this CR (which I suppose is the responsibility of the client) both fields can be filled, or ? What would a client put into the registry: field of the CR to create ? Or is this registry: field only there that it can be evaluated from the schema but is never supposed to be used in a CR ? (e.g the value to used by the client is actually the sample pattern in the schema)

The idea is that CRs only populate spec, as is typically done. We are hacking the CRD to include additional info for registry purposes, but we do not expect users to configure in their CRs the registry section. I added a description there saying something like: "Internal information for registry purposes. Users should not set this property"

I guess, I just haven't grokked the whole way how a client should interact with the importer registry to obtain enough information to create a CR for the importer to create.

However, regardless what the following meta-data would be invaluable for a client:

  • For each field which could be filled by the CLI, regardless how deep it is in the schema, there should be a flat key added as meta-data which provides a mapping to a single key on the command line. I'm not really deep in openAPI schema, but if there would be a possibility to annotate property declaration where to put the key on, that would be super helpful.
  • I have no idea how to deal with complex, schemaless properties like gcpCredsSecret or gcsCredsSecret of the generic type object without further schema. The description just mention that it Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys). How it is supposed for a CLI client to manage this JSON object ? Should this just add as a single line as an argument to a gcpCredsSecret key ? I don't consider this to be a good UX to provide json objects as CLI arguments (especially as the client has no idea how to validate).

Maybe the descriptions there are confusing. This is what a GCS CR currently looks like. It is subject to change, maybe making the gcsCredsSecret to be just a string, instead of an object. We are not passing the json, but rather a ref to a json secret. @Harwayne and I will be working on an initial CLI prototype, and we'll make sure to keep you in the loop so that we can get your feedback.

apiVersion: sources.aikas.org/v1alpha1
kind: GCSSource
metadata:
  name: gcs-source-example
spec:
  googleCloudProject: my-project
  gcsCredsSecret:  
    name: my-key
    key: key.json
  bucket: my-bucket
  eventTypes:
    - metadataUpdate
    - archive
  sink:
    apiVersion: serving.knative.dev/v1alpha1
    kind: Service
    name: my-service

Thanks for all your input!