Closed mikehelmick closed 4 years ago
One additional benefit of the importer registry is that for some eventtypes you may need to specify different kinds of arguments. With the current CRD this is not easily doable because the schema does not (at least seem) to support selective arguments on the fields. For example, say an importer that has 2 events foo,bar and for foo you need to provide parameter X and for bar you need Y, I'm not aware of any way to specify that through the current mechanism.
What if we could use a modified version of the current sources CRDs, thus there is no need to create any new resources?
Some of the current importers provide a list of event types as strings (e.g., the GithubImporter). I think if instead of being a list of strings, we can make each event an object, with certain common properties (ceType, ceSchema, etc.), which should be properly defined in the Importer Spec, as well as specific ones, in case the different events of a same importer need different properties (which I still have my doubts about). With this, we can avoid creating a new resource, and can leverage standard openAPI tooling.
Here is an example of what a GitHubImporter eventTypes might look like:
eventTypes:
type: object
properties:
pullRequest:
type: object
properties:
type:
type: string
pattern: "dev.knative.source.github.pull_request"
schema:
type: string
pattern: "pull-request-schema"
push:
type: object
properties:
type:
type: string
pattern: "dev.knative.source.github.push"
schema:
type: string
pattern: "push-schema"
# additionalProperties:
onlyForPush:
type: string
description: "Some push specific property"
required:
- onlyForPush
fork:
type: object
properties:
type:
type: string
pattern: "dev.knative.source.github.fork"
schema:
type: string
pattern: "fork-schema"
# additionalProperties:
onlyForFork:
type: string
description: "Some fork specific property"
anotherOneForFork:
type: string
description: "Some other one fork specific property"
required:
- onlyForFork
- anotherOneForFork
...
Or maybe the GCSSource one:
eventTypes:
type: object
properties:
finalize:
type: object
description: "Event triggered when an object is created or finalized"
properties:
type:
type: string
pattern: "com.google.storage.finalize"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/gcs/finalize.json"
specVersion:
type: string
pattern: "0.3"
finalizeField:
type: string
description: "You will have to configure this extra thing for finalize"
required:
- finalizeField
delete:
type: object
description: "Event triggered when an object is deleted"
properties:
type:
type: string
pattern: "com.google.storage.delete"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/gcs/delete.json"
specVersion:
type: string
pattern: "0.3"
archive:
type: object
description: "Event triggered when an object is archived"
properties:
type:
type: string
pattern: "com.google.storage.archive"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/gcs/archive.json"
specVersion:
type: string
pattern: "0.3"
justForArchive:
type: string
description: "You will have to configure this extra thing for archive"
required:
- justForArchive
metadataUpdate:
type: object
description: "Event triggered when an object's metadata is updated"
properties:
type:
type: string
pattern: "com.google.storage.metadataUpdate"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/gcs/metadataUpdate.json"
specVersion:
type: string
pattern: "0.3"
updateSpecific:
type: string
description: "You will have to configure this extra thing"
someUpdateOptionalField:
type: string
description: "no need to configure this stuff"
required:
- updateSpecific
Here is an example of a potential GCSSource CR...
apiVersion: sources.aikas.org/v1alpha1
kind: GCSSource
metadata:
name: gcs-source
spec:
googleCloudProject: my-project
gcsCredsSecret:
name: my-key
key: key.json
bucket: my-bucket
eventTypes:
metadataUpdate:
updateSpecific: my-update-specific-property
sink:
apiVersion: serving.knative.dev/v1alpha1
kind: Service
name: event-display
We do not expect users to configure type
nor schema
nor specVersion
in any of the cases, they will be defaulted with a mutating admission webhook. In case the user tries to configure them with some weird value, the validation will complaint due to the exact match pattern specified in the CRD.
By listing the sources with kubectl get crds -l eventing.knative.dev/source
And then describing them, or with some fancier jsonpath or go-templating thingy, we could easily understand the types that the importer provides, together with their schemas.
For example, only the types: kubectl get crds githubsources.sources.eventing.knative.dev -o=jsonpath='{..type.pattern}'
would output:
dev.knative.source.github.fork dev.knative.source.github.pull_request dev.knative.source.github.push
One additional benefit of the importer registry is that for some eventtypes you may need to specify different kinds of arguments.
The only example I can think of right now is GitHub. Depending on whether you're subscribing to an org-level event or a repo-level event, the repo argument may be invalid (membership) or required (push).[1] This seems like an edge case rather than a normal case, and supporting multiple parameter signatures per registry entry is a lot of complexity for an edge case.
Thought experiment: What is the CloudEvents source attribute value for each GitHub event type? I imagine for org events it's github.com/org
and for repo events it's github.com/org/repo
. The CloudEvents definition of source is ambiguous but I think in practice users will understand it as "the context I subscribe[d] to". IMO the user may reasonably assume that org and repo events are emitted by two different sources because the source syntaxes[2] are different.
What if "GitHub the service" actually had two entries in the registry, one for org events and one for repo events? Then each entry would have one parameter schema. The schema could be expressed in the CRD spec as @nachocano proposes (there would be two CRDs for subscribing to the two sources) or in a separate registry object (two registry entry objects, 1 or 2 CRDs), but in both cases there is one schema per entry.
I suspect this rule of "one parameter schema per source syntax" holds for all sources, but I don't have any other real examples. A contrived example is a GCPImporter
CRD that can import events from all of GCP.
I prefer to think of a source registry versus an importer registry, and source discovery versus importer discovery. We created the name "importer" to help contributors distinguish the event ingress from the event source, but I don't think users care about that distinction. The user goal is "I want to subscribe to a GitHub org event" rather than "I want to subscribe to an event from a GitHubImporter
object". I think we'll be in a better place if users (trigger creators) consider the subscribable context to be the abstract source rather than the concrete importer object, with the registry serving as an enumeration of the possible sources rather than the possible importers.
1: GitHub makes it possible to subscribe to repo-level events for all repos in an org, but this is a convenience feature that doesn't change the conceptual model for the Knative Eventing user. 2: I think this is the correct usage of the word syntax, but if there's a better word for this let me know. I also considered "template".
I prefer to think of a source registry versus an importer registry, and source discovery versus importer discovery.
+1 to source registry, but I have no strong opinions otherwise.
source registry is a fine way to think of it. The idea behind this is that users have a way to discover what event sources are available and how to configure them. Generally this will involve creating an importer.
I do wonder if we want to place a restriction that all importers must be backed by a CRD in order to be discoverable. Is that too heavyweight and are the other alternatives we should consider?
We may also want to keep in mind that the validation spec for CRDs will soon change to a subset of OpenAPI: https://kubernetes.io/blog/2019/06/20/crd-structural-schema/. Will that subset support what we want to do here?
In general I'm a bit concerned about locking our registry API to the CRD API. We already know CRD validation is switching to be per-version shortly. Do we want that? Is CRD versioning the same versioning we want with a source/importer registry?
Here is a working prototype of a GCSImporter using the CRD-based approach introduced above. Note that the documentation is outdated.
Example usage:
Discover the sources/importers
kubectl get crds -l eventing.knative.dev/source=true
Discover the event types that GCS can emit and their schema
kubectl get crds gcssources.sources.aikas.org -oyaml
Create a GCS Importer
apiVersion: sources.aikas.org/v1alpha1
kind: GCSSource
metadata:
name: gcs-source
spec:
googleCloudProject: my-project
gcsCredsSecret:
name: google-cloud-key
key: key.json
bucket: my-bucket
eventTypes:
metadataUpdate: {}
archive: {}
sink:
apiVersion: eventing.knative.dev/v1alpha1
kind: Broker
name: default
// TODO populate the EventType registry with the event types the broker will be able to provide as part of this instantiation.
This is the rough outline of a design for the Importer Registry. It is similar in some ways to @nachocano's design in https://github.com/knative/eventing/issues/1550#issuecomment-515604913. I believe that proposal is superior, but wanted to write this down both to have a record of it and in case others see more potential in it.
Create a new CRD, ImporterRegistration. The installation of any Importer CRD would include its corresponding ImporterRegistration CO.
apiVersion: eventing.knative.dev/v1alpha1
kind: ImporterRegistration
metadata:
name: github-importer
spec:
apiVersion: sources.eventing.knative.dev/v1alpha1
kind: GitHubImporter
types:
push:
openAPIV3SchemaMergePatch:
properties:
spec:
required:
- onlyForPush
# This field from the CRD is not used for this type, do not allow it.
onlyForFork: null
fork:
openAPIV3SchemaMergePatch:
properties:
spec:
required:
- onlyForFork
- anotherOneForFork
anotherOneForFork:
# Altered verification of the field.
minLength: 4
...
Conceptually, the spec includes the concrete importer to create (GitHubImporter in sources.eventing.knative.dev/v1alpha1) and the CloudEvent types that the importer can be configured to import (push and fork). Each event type is allowed to provide an RFC 7386 JSON Merge Patch that is applied to the CRD's generic OpenAPIV3Schema. A large benefit of this is that tooling can easily generate the OpenAPI schema for a specific event type, allowing construction of UIs that request only the necessary information for the desired event type.
One advantage this has over the proposal in https://github.com/knative/eventing/issues/1550#issuecomment-515604913, is that the same field name can be used in distinct ways by distinct event types, because the patch for each can change its description and validation. However, this is likely a feature we do not want as it makes things more complicated for the user.
Another advantage is that it does not require the Importer to include a spec.events
field, which is a map from CloudEvent type to object. The Importer author is free to setup their CRD however they want.
Overall, I think that the proposal in https://github.com/knative/eventing/issues/1550#issuecomment-515604913 is superior to this proposal. It removes the need for a distinct registration CO. While still allowing us to easily make UIs that interactively choose the parameters needed to instantiate an Importer. It is also easier to read through a single OpenAPI schema than it is to mentally apply patches from one OpenAPI schema to another.
@nachocano - Can you also describe how this fits in with the candidate CLI
And coordinate with @n3wscott on #1554 - the solution to this issue and that issue must work together.
I think a potential CLI might look as follows, considering the proposal in #1550 (comment)
Discover Sources: The ability to discover the sources of events is a prerequisite for being able to connect events from producers to consumers.
kn events sources list
gcs -- This source allows you to bring Google Cloud Storage events into Knative
github -- This source allows you to bring GitHub events into Knative
...
Discover Source's Event Types and how to Configure the Source: In order to configure delivery of events from one of the sources, a user needs to know what events the source can emit as well as how to configure such source.
kn events source gcs describe
description: This source allows you to bring Google Cloud Storage events into Knative
configuration:
google-cloud-project -- ID of the google cloud project that contains the bucket
service-account-secret -- name of the secret containing a GCP service account key with permission to create a GCP notification.
bucket-name -- GCS bucket to subscribe to.
target-service -- Service where to send the notifications to.
-- [optional]
object-name-prefix -- prefix to only notify when objects match this prefix
event-types:
finalize -- Event triggered when an object is created or finalized
schema:
specversion: CloudEvents v0.3
type: com.google.storage.finalize
schemaurl: https://cloud.google.com/cloudevents/schemas/gcs/finalize.json
configuration:
finalize-field -- example of a particular field for finalize event type
archive -- Event triggered when an object is archived
schema:
specversion: CloudEvents v0.3
type: com.google.storage.archive
schemaurl: https://cloud.google.com/cloudevents/schemas/gcs/archive.json
configuration:
just-for-archive -- example of a particular field required for archive event type
delete -- Event triggered when an object is deleted
schema:
specversion: CloudEvents v0.3
type: com.google.storage.delete
schemaurl: https://cloud.google.com/cloudevents/schemas/gcs/delete.json
metadataUpdate -- Event triggered when an object's metadata is updated
schema:
specversion: CloudEvents v0.3
type: com.google.storage.metadataUpdate
schemaurl: https://cloud.google.com/cloudevents/schemas/gcs/metadataUpdate.json
configuration:
update-specific -- Update specific field
-- [optional]
some-update-optional-field -- Update optional field
Note: I think we should also use have an example
section as part of the output of describe.
Create a GCS Source:
kn events source gcs create my-gcs-source \
--namespace default \
--config google-cloud-project=my-project,service-account-secret=my-secret,bucket-name=my-bucket,target-service=my-service \
--event-types finalize=finalize-field=my-finalize-field-value,delete,metadataUpdate=update-specific=my-update-specific-field-value,some-update-optional-field=my-update-optional-field \
The instantiation of this GCS source, only if pointed to a Broker (this is subject to change), should populate the EventType Registry
, which is a more useful mechanism to discover what are the event types that have been already hooked up and are able to consume from a particular Broker.
I'd prefer to extend the existing CRD model and not create additional registry for now. If we feel we need something like that in the future we can add it. If there are cases that @nachocano proposal doesn't cover, then I'd suggest we move forward with it. Main focus is to handle discovery from my point of view and I believe this case is handled by this proposal. As far as validation goes, I don't think the swagger doc is going to be sufficient alone for all the cases, and I believe what's proposed at the CRD spec level will be sufficient for the discovery use cases as well as the simple validation, but that the webhook will have to be involved in the more complex validation already.
I do wonder if we want to place a restriction that all importers must be backed by a CRD in order to be discoverable. Is that too heavyweight and are the other alternatives we should consider?
By talking with @Harwayne, it seems that his AutoContainerSource magic would allow us to avoid writing a controller, but there is still need for CRDs, that would specify the supported event types with their schemas. For example, in his example K8sEventSource CRD we would have to add that.
@n3wscott I'm wondering if we can use a similar mechanism to AutoContainerSource to make this Importer Registry work with the PullSubscription?
@n3wscott made some suggestions to the CRD approach, which IMO make sense. Basically, that we should decouple importer registry from "filtering" on the source (i.e., from specifying what event types to subscribe to). In other words, it would be nice to have, as the GitHub importer currently has, an eventTypes
array with the possible types (for filtering), and an internal eventTypes object with description of those types (for registry purposes). The latter is something that users will not need to specify when creating a source CO...
Going back to the GCS example, it might now look something like this (interesting parts are _eventTypes
and eventTypes
)
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
# name must match the spec fields below, and be in the form: <plural>.<group>
name: gcssources.sources.aikas.org
labels:
eventing.knative.dev/source: "true"
spec:
group: sources.aikas.org
version: v1alpha1
scope: Namespaced
names:
categories:
- all
- knative
- eventing
- sources
plural: gcssources
singular: gcssource
kind: GCSSource
shortNames:
- gcs
validation:
openAPIV3Schema:
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
properties:
_eventTypes:
type: object
description: "Internal description of event types, users should not set this property"
properties:
finalize:
type: object
properties:
type:
type: string
pattern: "com.google.storage.finalize"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/gcs/finalize.json"
delete:
type: object
properties:
type:
type: string
pattern: "com.google.storage.delete"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/gcs/delete.json"
archive:
type: object
properties:
type:
type: string
pattern: "com.google.storage.archive"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/gcs/archive.json"
metadataUpdate:
type: object
properties:
type:
type: string
pattern: "com.google.storage.metadataUpdate"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/gcs/metadataUpdate.json"
eventTypes:
type: array
items:
enum:
- finalize
- delete
- archive
- metadataUpdate
type: string
description: "The event types to subscribe to. If none is specified, we default to all of them"
gcsCredsSecret:
type: object
description: "Credential to use for creating a GCP notification. Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys)."
gcpCredsSecret:
type: object
description: "Optional credential to use for subscribing to the GCP PubSub topic. If omitted, uses gcsCredsSecret. Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys)."
serviceAccountName:
type: string
description: "Service Account to run Receive Adapter as. If omitted, uses 'default'."
googleCloudProject:
type: string
description: "Google Cloud Project ID to create the scheduler job in."
bucket:
type: string
description: "GCS bucket to subscribe to. For example my-test-bucket"
objectNamePrefix:
type: string
description: "Optional prefix to only notify when objects match this prefix."
payloadFormat:
type: string
description: "Optional payload format. Either NONE or JSON_API_V1. If omitted, uses JSON_API_V1."
sink:
type: object
description: "Where to sink the notifications to."
required:
- gcsCredsSecret
- googleCloudProject
- bucket
- sink
A GCS CO would look something like this now:
apiVersion: sources.aikas.org/v1alpha1
kind: GCSSource
metadata:
name: gcs-source-example
spec:
googleCloudProject: my-project
gcsCredsSecret:
name: my-key
key: key.json
bucket: my-bucket
eventTypes:
- metadataUpdate
- archive
sink:
apiVersion: serving.knative.dev/v1alpha1
kind: Service
name: my-service
Working prototype here
@n3wscott does this address your comments? fyi @vaikas-google @Harwayne @grantr
BTW, as just pointed at by @Harwayne, this changes does not give us the flexibility to set different properties for different event types of the same importer, but I think that is a border case, and most of the time, they will need the same config. Thoughts?
so the EventTypes stays as is: https://github.com/vaikas-google/gcs/blob/master/pkg/apis/gcs/v1alpha1/types.go#L73
(Looks like I forgot to add the EventTypes to the openapi spec, oops)
And would the __eventTypes be added to the above file or only to the openapi spec? If only to the spec, that seems fine. I assume that will not cause any gried?
Can we call it _eventTypesSupported
instead of _eventTypes
? Might be less confusing to users (and contributors when they accidentally forget the _
)
I like _eventTypesSupported
or _eventTypesProduced
.
-Scott
On Wed, Jul 31, 2019 at 2:20 PM Grant Rodgers notifications@github.com wrote:
Can we call it _eventTypesSupported instead of eventTypes? Might be less confusing to users (and contributors when they accidentally forget the )
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knative/eventing/issues/1550?email_source=notifications&email_token=AHWPD4H4WTRWP6Z2LUDUPULQCH62VA5CNFSM4IF6BDV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IS66Y#issuecomment-517025659, or mute the thread https://github.com/notifications/unsubscribe-auth/AHWPD4EAYBJQGNZGUM63IATQCH62VANCNFSM4IF6BDVQ .
so the EventTypes stays as is: https://github.com/vaikas-google/gcs/blob/master/pkg/apis/gcs/v1alpha1/types.go#L73
(Looks like I forgot to add the EventTypes to the openapi spec, oops)
Yes, but I made them to be an enum. See here:
And would the __eventTypes be added to the above file or only to the openapi spec? If only to the spec, that seems fine. I assume that will not cause any gried?
I added them in both, see here. But in the webhook I'm validating the internal ones are not populated. Didn't try not adding them in the .go file. If it works without having them there, I guess it's better.
Can we call it
_eventTypesSupported
instead of_eventTypes
? Might be less confusing to users (and contributors when they accidentally forget the_
)
+1 to _eventTypesSupported
But in the webhook I'm validating the internal ones are not populated.
Will every importer (that wants to be in the registry) need to run a webhook to deny edits to these fields?
Didn't try not adding them in the .go file. If it works without having them there, I guess it's better.
I expect we'll need to read these values at some point in go code, but maybe it doesn't have to be the concrete importer type. Seems like it could be a duck type only.
I would expect those (__eventTypesSupported) to be only for discovery and not used by the types. I could see us reading the CRD openapi for tooling, but I can't think (off the top of my head) when we'd want them to be in the objects Spec field, that seems a bit wonky.
Will every importer (that wants to be in the registry) need to run a webhook to deny edits to these fields?
Hopefully not... Do you know if the pattern in the CRD is enforced without having a webhook? If a user wants to put something different than the pattern, it should fail.
I expect we'll need to read these values at some point in go code, but maybe it doesn't have to be the concrete importer type. Seems like it could be a duck type only.
Yes, if we need to read them in go it can end up being a CloudEventable
duck maybe...
I don't see a github webhook, and IIRC the enum values in eventTypes were being validated
I would expect those (__eventTypesSupported) to be only for discovery and not used by the types. I could see us reading the CRD openapi for tooling, but I can't think (off the top of my head) when we'd want them to be in the objects Spec field, that seems a bit wonky.
I guess we might need to read them for extracting the schema and the type if we don't want them hardcoded when creating the entries in the EventType registry... I might be wrong
Would you mind doing the experiment of removing it from the types.Spec? It feels wrong to me to have a field there that's only used to convey information to the user, leave it to the openapi spec. After reading a bit more, perhaps instead of adding them to the properties section of the schema, we could utilize the additionalProperties and add the __eventTypesSupported there (or maybe in that case they could just be eventTypesSupported since they wouldn't collide)? CRDs support the additionalProperties field: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/
@n3wscott ring any bells?
As far as adding them into the registry, I think it would be nice for a CO to actually emit in its status what it's actually been configured to emit and that Status should drive the registry. Then there could be one controller that could look at all the COs and populate the registry without each CRD having to stash that. Anyways, haven't thought that one through and conflating bit of issues, but...
actually emit in its status what it's actually been configured
I think that is nice.
On Wed, Jul 31, 2019 at 2:57 PM Ville Aikas notifications@github.com wrote:
Would you mind doing the experiment of removing it from the types.Spec? It feels wrong to me to have a field there that's only used to convey information to the user, leave it to the openapi spec. After reading a bit more, perhaps instead of adding them to the properties section of the schema, we could utilize the additionalProperties and add the __eventTypesSupported there (or maybe in that case they could just be eventTypesSupported since they wouldn't collide)? CRDs support the additionalProperties field:
https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/
@n3wscott https://github.com/n3wscott ring any bells?
As far as adding them into the registry, I think it would be nice for a CO to actually emit in its status what it's actually been configured to emit and that Status should drive the registry. Then there could be one controller that could look at all the COs and populate the registry without each CRD having to stash that. Anyways, haven't thought that one through and conflating bit of issues, but...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knative/eventing/issues/1550?email_source=notifications&email_token=AHWPD4CT6UZNNFA56X4Y4SDQCIDERA5CNFSM4IF6BDV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IV4SA#issuecomment-517037640, or mute the thread https://github.com/notifications/unsubscribe-auth/AHWPD4DCXRWDFUDT37YWBHLQCIDERANCNFSM4IF6BDVQ .
Would you mind doing the experiment of removing it from the types.Spec? It feels wrong to me to have a field there that's only used to convey information to the user, leave it to the openapi spec. After reading a bit more, perhaps instead of adding them to the properties section of the schema, we could utilize the additionalProperties and add the __eventTypesSupported there (or maybe in that case they could just be eventTypesSupported since they wouldn't collide)? CRDs support the additionalProperties field: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/
I removed them from the spec and it works... And if you want to set them in yaml to some invalid value, it gets validated without a webhook...
spec._eventTypesSupported.archive.type in body should match 'com.google.storage.archive'
It might be better to have it as an additionalProperties. I think I tried that and you couldn't have both of them specified (properties and additionalProperties) under spec, but I can try again
As far as adding them into the registry, I think it would be nice for a CO to actually emit in its status what it's actually been configured to emit and that Status should drive the registry. Then there could be one controller that could look at all the COs and populate the registry without each CRD having to stash that. Anyways, haven't thought that one through and conflating bit of issues, but...
Yes, @n3wscott mentioned this, and it makes sense. But even in that case, if we don't want the type and schema hardcoded in go, we will need to read those values.
Ok, so leaving them out of the spec (types.go file) and adding them to the config/CRD.yaml definition is cool? I think I'm fine with that esp. given the documentation there, I just don't want them in the code if possible.
as far as hardcoding, I don't quite understand. Who was setting those values in the actual code? I can see you defining the types.go spec.__eventTypes, but where was the population of that map happening?
:+1: to putting it in additionalProperties
. That will remain supported in CRD structural schemas (https://kubernetes.io/blog/2019/06/20/crd-structural-schema/)
Ok, so leaving them out of the spec (types.go file) and adding them to the config/CRD.yaml definition is cool? I think I'm fine with that esp. given the documentation there, I just don't want them in the code if possible.
Yes, agree. Let's not put them in types.go..
as far as hardcoding, I don't quite understand. Who was setting those values in the actual code? I can see you defining the types.go spec.__eventTypes, but where was the population of that map happening?
In GCS we were never creating EventTypes in the registry... By hardcoding I mean for example here
to putting it in
additionalProperties
. That will remain supported in CRD structural schemas (https://kubernetes.io/blog/2019/06/20/crd-structural-schema/)
Cannot put properties and additionalProperties under the same field, spec
in this case. I recall I tried that.
The CustomResourceDefinition "gcssources.sources.aikas.org" is invalid: spec.validation.openAPIV3Schema.properties[spec].additionalProperties: Forbidden: additionalProperties and properties are mutual exclusive
Having something else than spec
and status
doesn't comply with k8s, right?
Having something else than
spec
andstatus
doesn't comply with k8s, right?
Kubernetes doesn't mandate spec
and status
; it's just a convention. Several core objects have neither.
Having something else than
spec
andstatus
doesn't comply with k8s, right?Kubernetes doesn't mandate
spec
andstatus
; it's just a convention. Several core objects have neither.
OK, in that case, if we still want them separated from spec.eventTypes
, we can add some other property... But I'm OK with leaving _eventTypesSupported
under spec
.
My thought after reading this discussion is that we definitely need a CLI to easily manipulate these concepts (importers, eventTypes), other than introducing new CRDs. Eventing CLI can be a higher abstraction for users and a wrapper of low level implementations. If new CRDs is necessary for new features and new use cases, of course we should add them. If only to improve the user experiences, I believe CLI is a better choice. If we only depends on CRD and kubectl
, things are complicated.
/milestone v0.9.0
/assign
Talking more with @n3wscott, and following up on a comment made by @vaikas-google and @grantr here, it seems that we should have a separate attribute for registry-related things in the CRD (which should be specified in the importer spec), so as not to confuse the user by putting it in the spec section...
The GCS source would then look something as follows:
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
# name must match the spec fields below, and be in the form: <plural>.<group>
name: gcssources.sources.aikas.org
labels:
eventing.knative.dev/source: "true"
spec:
group: sources.aikas.org
version: v1alpha1
scope: Namespaced
names:
categories:
- all
- knative
- eventing
- sources
plural: gcssources
singular: gcssource
kind: GCSSource
shortNames:
- gcs
validation:
openAPIV3Schema:
properties:
registry:
type: object
description: "Internal information for registry purposes. Users should not set this property"
properties:
eventTypes:
type: object
properties:
finalize:
type: object
properties:
type:
type: string
pattern: "google.storage.object.finalize"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/storage/finalize.json"
delete:
type: object
properties:
type:
type: string
pattern: "google.storage.object.delete"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/storage/delete.json"
archive:
type: object
properties:
type:
type: string
pattern: "google.storage.object.archive"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/storage/archive.json"
metadataUpdate:
type: object
properties:
type:
type: string
pattern: "google.storage.object.metadataUpdate"
schema:
type: string
pattern: "https://cloud.google.com/cloudevents/schemas/storage/metadataUpdate.json"
spec:
properties:
eventTypes:
type: array
items:
enum:
- finalize
- delete
- archive
- metadataUpdate
type: string
description: "The event types to subscribe to. If none is specified, we default to all of them"
gcsCredsSecret:
type: object
description: "Credential to use for creating a GCP notification. Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys)."
gcpCredsSecret:
type: object
description: "Optional credential to use for subscribing to the GCP PubSub topic. If omitted, uses gcsCredsSecret. Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys)."
serviceAccountName:
type: string
description: "Service Account to run Receive Adapter as. If omitted, uses 'default'."
googleCloudProject:
type: string
description: "Google Cloud Project ID to create the scheduler job in."
bucket:
type: string
description: "GCS bucket to subscribe to. For example my-test-bucket"
objectNamePrefix:
type: string
description: "Optional prefix to only notify when objects match this prefix."
payloadFormat:
type: string
description: "Optional payload format. Either NONE or JSON_API_V1. If omitted, uses JSON_API_V1."
sink:
type: object
description: "Where to sink the notifications to."
required:
- gcsCredsSecret
- googleCloudProject
- bucket
- sink
Does that sound right for everyone?
validation: openAPIV3Schema: properties: apiVersion: type: string kind: type: string metadata: type: object registry: type: object
Do we need the k8s typemeta stuff here? otherwise, LGTM. thanks for working on this!
validation: openAPIV3Schema: properties: apiVersion: type: string kind: type: string metadata: type: object registry: type: object
Do we need the k8s typemeta stuff here? otherwise, LGTM. thanks for working on this!
I think is not needed. It was just a copy/paste from the current one. Safe to remove them. Thanks!
LGTM. My only concern is this solution might not be forward-compatible if the CRD controller decides to validate validation
and rejects additional top-level properties. Unlikely?
I've read through the issue, but I couldn't find the answer to the following question (sorry for my ignorance if hidden somewhere ;-):
From a CLI perspective who would be responsible for mapping flat key-value options given on the command line to a nested and structure schema for the CR to create ? The client can discover the openapi schema from the registry, but is the mapping also done by the importer registry or is this the duty of the client itself (e.g. by establishing some kind of jsonpath semantics for the keys) ?
How would that work for complex importer CRs, like the one for the Camel importer ? (like the example in https://github.com/knative/eventing-contrib/blob/master/camel/source/samples/source_camel_k.yaml which holds a DSL description in its content:
field).
From a CLI perspective who would be responsible for mapping flat key-value options given on the command line to a nested and structure schema for the CR to create ? The client can discover the openapi schema from the registry, but is the mapping also done by the importer registry or is this the duty of the client itself (e.g. by establishing some kind of jsonpath semantics for the keys) ?
I think the CLI should be responsible for the mapping. There is a proposed CLI experience here. It's slightly outdated with the latest changes in this thread, but the overall idea is kind of the same.
How would that work for complex importer CRs, like the one for the Camel importer ? (like the example in https://github.com/knative/eventing-contrib/blob/master/camel/source/samples/source_camel_k.yaml which holds a DSL description in its
content:
field).
I'm not very aware of the Camel importer. I couldn't find a place in the code where it's actually emitting CloudEvents. Is it? If it is, then one thing to do would be to add in the Camel importer CRD the list of event types it can emit. And same as above, if we properly define the fields in the CRD, what is required, and what not, descriptions of the fields, etc., then I think the CLI should be able to read the openAPIV3Validation schema and do the mapping... Does that make sense?
I think the CLI should be responsible for the mapping. There is a proposed CLI experience here. It's slightly outdated with the latest changes in this thread, but the overall idea is kind of the same.
From the example given it looks to me, this works only nicely for top-level elements of a simple CR. How would you dive into deeper structures, supporting complex CR parts like object and arrays? I'm not sure how this could be kept in a general fashion and still provide a good user experience. Using typeless, comma-separated properties which could grow endlessly is typically not such a good thing from a UX POV.
I believe it would be very helpful if the registry would also contain information about how CLI options (or property keys) should be named and how they map to deeper CR fields. That's the only way how you can provide a satisfactory UX as there will be no 'generic' mapping available which is easy to understand. Also, this would help to give the user a 'flat' key-value experience for the CLI, and helps in creating help messages.
I'm not very aware of the Camel importer. I couldn't find a place in the code where it's actually emitting CloudEvents. Is it? If it is, then one thing to do would be to add in the Camel importer CRD the list of event types it can emit.
/cc @nicolaferraro
And same as above, if we properly define the fields in the CRD, what is required, and what not, descriptions of the fields, etc., then I think the CLI should be able to read the openAPIV3Validation schema and do the mapping... Does that make sense?
I think it's not a big issue for scalar data values which are short (e.g not big DSL like files). A json path like mapping could be used in this case. However, for values with a lot of content or CR using complex data structures like arrays, or deep CRs schemas, this might be not so user-friendly anymore, were people typically expect short, typed options (e.g. --gcsCredential .... --topic ...
) which are also shown in a help page.
From the example given it looks to me, this works only nicely for top-level elements of a simple CR. How would you dive into deeper structures, supporting complex CR parts like object and arrays? I'm not sure how this could be kept in a general fashion and still provide a good user experience. Using typeless, comma-separated properties which could grow endlessly is typically not such a good thing from a UX POV.
I believe it would be very helpful if the registry would also contain information about how CLI options (or property keys) should be named and how they map to deeper CR fields. That's the only way how you can provide a satisfactory UX as there will be no 'generic' mapping available which is easy to understand. Also, this would help to give the user a 'flat' key-value experience for the CLI, and helps in creating help messages.
The main idea of the importer registry as it is right now is to enable the discovery of the event types that the importers can provide, thus that additional registry
property in the CRD. There will be an Importer Spec, anytime soon (cc @n3wscott), which will specify what a "well-behaved" importer should look like, e.g., have this registry
property with eventTypes
that have CloudEvents type
and schema
defined, etc, etc... I do think that if there are CLI guidelines/requirements we should follow, it would be nice if we can capture them in this spec.
But I think you are right, it might be hard to make a CLI out of these CRDs to create the actual importer object. Can you add an example of how do you think the GCS CRD example can be improved to make the CLI experience better? Or maybe some other more complicated one?
Thanks for the comment BTW, it's very useful...
Thanks for response, really very helpful. However, I have a bit of a hard time to wrap my head around this GCS CRD example.
My first assumption this is the CRD that a client can lookup from the importer registry. Is this the case ? Or does the client lookups an instance of this CRD (a CR) ?
If it is the CRD, than IIUR a CR created based on this schema can carry a top-level spec:
and registry:
section. So for instantiation of this CR (which I suppose is the responsibility of the client) both fields can be filled, or ? What would a client put into the registry:
field of the CR to create ? Or is this registry:
field only there that it can be evaluated from the schema but is never supposed to be used in a CR ? (e.g the value to used by the client is actually the sample pattern in the schema)
I guess, I just haven't grokked the whole way how a client should interact with the importer registry to obtain enough information to create a CR for the importer to create.
However, regardless what the following meta-data would be invaluable for a client:
gcpCredsSecret
or gcsCredsSecret
of the generic type object
without further schema. The description just mention that it Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys). How it is supposed for a CLI client to manage this JSON object ? Should this just add as a single line as an argument to a gcpCredsSecret
key ? I don't consider this to be a good UX to provide json objects as CLI arguments (especially as the client has no idea how to validate).Thanks for response, really very helpful. However, I have a bit of a hard time to wrap my head around this GCS CRD example.
My first assumption this is the CRD that a client can lookup from the importer registry. Is this the case ? Or does the client lookups an instance of this CRD (a CR) ?
The client should lookup for the CRDs, not a particular CR...
If it is the CRD, than IIUR a CR created based on this schema can carry a top-level
spec:
andregistry:
section. So for instantiation of this CR (which I suppose is the responsibility of the client) both fields can be filled, or ? What would a client put into theregistry:
field of the CR to create ? Or is thisregistry:
field only there that it can be evaluated from the schema but is never supposed to be used in a CR ? (e.g the value to used by the client is actually the sample pattern in the schema)
The idea is that CRs only populate spec
, as is typically done. We are hacking the CRD to include additional info for registry purposes, but we do not expect users to configure in their CRs the registry
section. I added a description there saying something like: "Internal information for registry purposes. Users should not set this property"
I guess, I just haven't grokked the whole way how a client should interact with the importer registry to obtain enough information to create a CR for the importer to create.
However, regardless what the following meta-data would be invaluable for a client:
- For each field which could be filled by the CLI, regardless how deep it is in the schema, there should be a flat key added as meta-data which provides a mapping to a single key on the command line. I'm not really deep in openAPI schema, but if there would be a possibility to annotate property declaration where to put the key on, that would be super helpful.
- I have no idea how to deal with complex, schemaless properties like
gcpCredsSecret
orgcsCredsSecret
of the generic typeobject
without further schema. The description just mention that it Must be a service account key in JSON format (see https://cloud.google.com/iam/docs/creating-managing-service-account-keys). How it is supposed for a CLI client to manage this JSON object ? Should this just add as a single line as an argument to agcpCredsSecret
key ? I don't consider this to be a good UX to provide json objects as CLI arguments (especially as the client has no idea how to validate).
Maybe the descriptions there are confusing. This is what a GCS CR currently looks like. It is subject to change, maybe making the gcsCredsSecret
to be just a string, instead of an object. We are not passing the json, but rather a ref to a json secret.
@Harwayne and I will be working on an initial CLI prototype, and we'll make sure to keep you in the loop so that we can get your feedback.
apiVersion: sources.aikas.org/v1alpha1
kind: GCSSource
metadata:
name: gcs-source-example
spec:
googleCloudProject: my-project
gcsCredsSecret:
name: my-key
key: key.json
bucket: my-bucket
eventTypes:
- metadataUpdate
- archive
sink:
apiVersion: serving.knative.dev/v1alpha1
kind: Service
name: my-service
Thanks for all your input!
Problem In working through #1381 one of the key things that we have found when targeting the user experience is that discoverability is highly important. For motivations, we look to the Scenarios for Knative Eventing.
To recap the 3 scenarios describe there
FaaS - directly connect an importer (specific instance) to a consumer
Event-Driven - current implementation of Broker + Trigger
Black-box integration - scenario 3 is a specialization of both scenarios 1 and 2. From a developer’s viewpoint, consuming events is just like scenario 2 (Event-Driven) in that if an event is available on my cluster, I am able to trigger off of it.
The current registry implementation is event type centric (the CRD is literally called EventType). This works really well for scenario 2 (on-cluster, event-driven scenario) and scenario 3 (black-box integration).
This works less well for scenario 1 (FaaS) because it makes discovery of events via source more difficult to build. This doesn’t mean that the existing registry is wrong our less useful. There is currently one entry in the registry per event type, this is actually a tuple of Source and Type.
We propose introducing a new importer registry.
This gives the overall registry 2 components working together to facilitate discovery and configuration of events. The importer registry will be responsible for informing a user (and tooling) of which producers they may solicit what events from and how to configure those importers.
When an importer is configured, the importer is responsible for populating the EventType appropriately.
As a concrete example, the importer registry tells me that I can get Google Cloud Storage “finalize” events by instantiating the GCSSource kind. If two GCSSource objects are created for finalize events, the finalize event type should only appear in the EventType registry once. The EventType should reflect which importers are providing that type (can be an annotations, status, etc. to be designed).
Persona: Event Consumer
Exit Criteria A CLI can be written that can programmatically discovery and configure importers.
Time Estimate (optional): 5d