lindydonna commented 5 years ago

I’m a PM at Google, working on the CLI and UI experience for GCP products built on Knative Eventing. The current system has a lot of granular objects that can be combined in very flexible ways. Unfortunately, this makes it quite difficult to create a user experience for some common cases. I think these considerations would apply to most vendors who are building products on top of Knative, so I’d like to get feedback from both the community and especially other vendors doing their own implementations.

I think there are two key developer scenarios that our product needs to support:

[FaaS scenario] (Similar to: AWS Lambda and Google Cloud Functions) As a developer, I want to deliver from a managed cloud service, to a target Service (or Addressable). Managed cloud services are services like AWS S3, Google Cloud Storage, Pub/Sub, etc.
[Event-Driven scenario] (Similar to: Kafka, Google Pub/Sub, AWS SNS, etc)
1. As an event-producer developer, I want to define custom on-cluster events and send them to a central location.
2. As an event-consumer developer, I want to consume these custom on-cluster events, without directly specifying the sender of the event (i.e., the Service generating the event).

The Event-Driven scenario is very easy to express in Knative Eventing. The FaaS scenario is easy as well, but only if you just use Importer directly.

NOTE: Here I am assuming there is value in having a single model for both scenarios (see @sixolet's comment below for options along with the pros and cons). This may not be a valid assumption.

A good UI and CLI for the FaaS experience would involve customers configuring an upstream sender and target Service in one logical step. Since we don’t currently have an object that does exactly this, I’ll use the placeholder name Storm.

Note: Trigger could be modified to have the properties we need in Storm. It just doesn’t have them today.

Storm could be implemented with a number of objects, but the key is that ideally, FaaS customers just think about Storm. If they list all Storms in a namespace, they should have a complete view of what is happening.

Unfortunately, the current model has the following issues:

To create a Storm, you’d have to create an EventSource and a Trigger. Creation works, but after that there’s no clear way to list all Storms in the system. (Trigger could have all the fields that are currently in the EventSource, but then the developer would have to keep both in sync.)
Suppose Alice instantiates an EventSource for PubSub/awesome-topic (exposing event type AwesomeEvent) and a Trigger for AwesomeEvent -> Service1. Bob, not knowing the state of the cluster, also instantiates PubSub/awesome-topic and creates a Trigger for AwesomeEvent -> Service2. Now, Service1 and Service2 will get duplicate events.

Problem 1 makes it very hard to create a UI or CLI that lists all Storms in the system.

Problem 2 could even cause semantic issues if both EventSources publicize the same event type, without realizing that there are other subscribers.

duglin commented 5 years ago

Great topic. First a question, on your 2, why would Service1 and Service2 get dup events? Wouldn't Service1 only see events driven by the first EventSource+Trigger?

I haven't had a chance to fully think through how this would look, but in the times when I did ponder it, I would think about what I would want to say as an end using trying to hook these things up. And it usually comes down to something like:

I want x,y,z events from event producer A sent to my KnService C
e.g. I want the "push" event from github repo myApp sent to my "testPR" KnService

And in order to express that I would probably create a KnEventing equivalent to the KnService. Meaning, some single entity (a Storm) that allowed me to express everything and under the covers it split that data up into smaller (more targeted) resources to get the job done. The way a KnService is used to create revisions, routes, etc...

So, in the above example, it might do this:

create a github event source, and set the sink to the "testPR" KnService, or
create a github event source, set the sink to a broker, and create a trigger for that broker that has the "testPR" as the target/sink, or
based on what's already in the system, it might just need to create a new trigger due to an event source and broker already existing

Whichever the system chooses is an impl detail, but could be influence by optional properties on Storm. For example, an option that indicates the QoS which would control whether things are persisted or not. Or, another one that indicates whether this event is a candidate to be shared with other parts of the system (meaning goes thru a broker that allows for multiple triggers) or just one.

The net of it to me is that I think the user should express what the want as an end result, and then magic happen via internal resources that they shouldn't need to think about.

sixolet commented 5 years ago

Two unrelated and contradictory views of this problem, neither of which introduces a new object:

"Storm" is and has always been the Event Importer. Event importers can send to any target, including directly to a Service. Event Importers should include in their spec whatever users tend to need to subscribe to the right events for their problem domain, including whatever filtering they need. Brokers and triggers are more for your in-cluster event mesh. An importer might say something like "give me Slack events on channel foo matching the regex (eggs)+ spam".

Properties:

Importers can be simple
Triggers can be simple
Importers get complicated when you start trying to share work between requests for the same off-cluster events.
You do similar tasks (asking for a subset of on-cluster events (create a Trigger), asking for a subset of off-cluster events (create an Importer)) in wildly different ways.
You have to understand which category of thing you are doing before doing it. If you try to use the wrong model, ex, you end up firehosing your entire slack server into your broker, just cause you want eggs and spam from one channel.
Two loosely coupled models for eventing

"Storm" is Trigger. An Importer represents the potential to get these events onto the cluster and into the broker. Event Importers should be smart enough to look at their relevant triggers and adjust the events they ask for from off-cluster to only pull in the events that someone's asking for by making a Trigger; the Trigger functions as "here is a request for these events". Creating the Trigger tells the Importer to realize that its potential in a particular way. In this world creating the Importer is a setup task, but you only have to do it once per credential-access-realm. Ex, once per github org, once per slack server, once per GCP project, etc. An Importer might say something like "connect to the Slack server chat.foo.com"; a Trigger might say something like "give me Slack events on channel foo matching the regex (eggs)+ spam".

Properties:

Importers need to look at Triggers if they want to be efficient. Your average importer that does all this properly is a little less simple.
Most users only ever need to interact with Triggers.
You do all eventing tasks that aren't "configure a connection to a service" by creating a Trigger.
An overly-broad trigger ("all events in the broker") may not work as expected.
One model for eventing.

deissnerk commented 5 years ago

@duglin I like the overall vision. A while ago I wrote this document. Terminology is still kind of messy, but the overall idea is to define Broker as something that takes a Trigger (My service wants to get events of type a from source b) as input and configures so-called Routers and Sources (now Importers) accordingly.

mikehelmick commented 5 years ago

/assign mikehelmick

lindydonna commented 5 years ago

@duglin said:

Great topic. First a question, on your 2, why would Service1 and Service2 get dup events? Wouldn't Service1 only see events driven by the first EventSource+Trigger?

We answered this in the WG meeting this week, but so that the discussion is recorded: note that Trigger is always used with Broker. So, if a customer does EventSource -> Trigger it always goes to Broker. If the EventSource/Importer is connected directly to the sink, as mentioned by @sixolet, then the duplicate problem is sidestepped entirely.

I'm going to create a doc so that we can collaborate on the ideas easily, but will not get to that for a few more days.

grantr commented 5 years ago

I tend to generalize the two scenarios thusly:

1) I wish to express interest by specifying the attributes of the event source. 2) I wish to express interest by specifying the attributes of the event.

In some cases, the specification of the event source attributes and the event attributes are identical. For example:

org: knative
repo: eventing
type: com.github.pull.create

Assuming GitHub credentials are identified by some other mechanism, this is sufficient to specify the attributes of the source connection and the attributes of the event to filter.

Other cases may not allow this overlap, mainly when the upstream source supports a filtering language that is not supported downstream. For example:

type: mysql.row.update
query: select * from myapp.users where login_time > date_sub(now(), interval 60 seconds)

In this case the downstream filter does not support querying the event stream via SQL, and cannot reliably derive a supported filter from the query. The query is an event source attribute, not an event attribute.

vaikas commented 5 years ago

For completeness sake, can we jot down why FaaS case is not answered by wiring a Source.directly. I.E. not using Broker/Trigger.

duglin commented 5 years ago

+1 to what I think is the broader point @vaikas-google is making.... we shouldn't assume people are using (or want to use) all of the features of KnEventing. Meaning, they may not want/need brokers, channels, triggers, etc... in all scenarios - they should be used only when they need to be used.

And this may not be possible, or too much magic is involved, but if we can get to the point where a user expresses their interest in what events they want (along with the minimal data they think we should need to achieve that goal for them), we should them be able to put together the wiring under the covers for them. And part of that work means knowing when (or not) to use the more advanced features. And the infrastructure used should change as the user's requirements change (scale up/down in complexity).

lindydonna commented 5 years ago

For completeness sake, can we jot down why FaaS case is not answered by wiring a Source.directly. I.E. not using Broker/Trigger.

I'll answer here, since I haven't yet started a doc.

Some folks on the Google side feel that it is best to have a single user model across both scenarios. So, per @sixolet's comment above, her option # 1 would present a different model, depending on which case you are in.

As she put it:

You do similar tasks (asking for a subset of on-cluster events (create a Trigger), asking for a subset of off-cluster events (create an Importer)) in wildly different ways.

My personal view is that these two scenarios do not necessarily need to have the same objects, but obviously it would be preferable there were just one model (all else being equal).

If we decide that there should be one set of objects, we need a design for how Importer -> Broker -> Trigger can work in a reasonable way.

Great question, and I'll clarify in the original issue.

mikehelmick commented 5 years ago

I am of the opinion that there is value in being able to have a consistent trigger writing experience, without regard to where the events are sourced from (off cluster producer talking directly to broker, off producer through importer, on cluster producers).

There are great building blocks and we shouldn't prevent people from assembling them in whichever way they see fit. However, for building a product out of the components of eventing, there is value in providing consistent, high level abstractions to developers and leaving lower level config to operators.

I think it would be difficult to build a UX over a collection of related, but different abstractions. Since event source is not a type in itself, it makes it extremely difficult to list all of the importers that might be pointing to a function, or event to list the importers one might have. Assuming you could list all importers, you need to cross that with all triggers and all subscriptions.

This is where I think there is value in using trigger as the common abstraction for understanding how events are intended to be delivered in the system. There's a single thing to list, I can envision building a CLI or UI out of it with trigger as an entry point. From there, we can provide deeper analysis of how the events are actually getting there through the broker.

As I said before - people can set up events to flow another way, but for people choosing to use broker + trigger, I would like to open up all scenarios to them.

lindydonna commented 5 years ago

I've started a doc in the shared drive so we can collaborate more easily. See High-level eventing scenarios (June 2019)

mikehelmick commented 5 years ago

I wanted to give an update here and I would be happy to discuss this at the next WG meeting.

We are currently exploring an extension to Trigger to support the scenarios described for the storm object and the scenarios in the high-level eventing scenarios doc.

What we are looking at is

Extending the registry to [optionally] describe how to create an importer for a given source type
Extend trigger so that when creating a trigger, it is possible to instantiate an importer that will feed events to that trigger, again optional.
To enable the previous one, we'll make Trigger implement Addressable contract, the URL will actually point to the broker. something like http://default-broker.default.svc.local/triggers/trigger-uid

This makes supporting the FaaS scenario straight forward, create a trigger for a specific producer. When the trigger is deleted, the importer is deleted.

The Event-Driven scenarios can be solved by adding events to the event registry that don't specific how to create an importer.

This allows a platform provider to have a pre-configured registry (or even an immutable registry).

Developers can interact with the system simply through triggers. This makes listing all triggers or listing triggers that go to a specific service easy to do. I believe this makes CLI/UI scenarios easier to develop.

lionelvillard commented 5 years ago

when you say:

an importer that will feed events to that trigger,

through the broker or directly? I think you mean through the broker, is that correct?

mikehelmick commented 5 years ago

Correct - through the broker.

Harwayne commented 5 years ago

when you say:

an importer that will feed events to that trigger,

through the broker or directly? I think you mean through the broker, is that correct?

The event is sent via the Broker exclusively to that Trigger. The event will not be seen by other Triggers associated with the same Broker. The event will go through the same general Broker ingress validation as all events transiting that Broker (e.g. auth checks from #705). Replies from the Trigger will go back into the Broker, broadcasting to all Triggers associated with that Broker.

For example, I have a Broker with ten Triggers. An importer sends a CloudEvent, 'e', to Trigger 't', via the Addressable contract. 'e' is seen only by Trigger 't'. The other nine Triggers do not see 'e'. Trigger 't' replies to the event with a new CloudEvent, 'e2'. All ten Triggers see 'e2' [1].

[1] - Currently an open question if 't' sees 'e2', see the next section.

Proposed Terminology:

Broadcast events - events that are sent to all Triggers associated with the Broker. This is all events sent to a 0.7 Broker.
Targeted events - events that are sent to a specific Trigger, via the Broker. This is all events sent by importers created by a Trigger.
Broadcast Triggers - Triggers that receive all Broadcast events sent to their associated Broker. This is all 0.7 Triggers
Targeted Triggers - Triggers that create Importers.

There is an open question about whether Targeted Triggers receive only Targeted events, or they receive both Targeted and Broadcast events. Any thoughts?

lionelvillard commented 5 years ago

For me, as an user, it makes more sense to have targeted triggers receiving only targeted events. Then I don't have to worry about receiving events I didn't ask for.

grantr commented 5 years ago

Several of us are working on a prototype to explore the ideas @mikehelmick described in https://github.com/knative/eventing/issues/1381#issuecomment-507942384 with the goal of understanding better how the UX and implementation could actually work. We hope to share our findings next week.

sixolet commented 5 years ago

I'm also nervous (like @lionelvillard ) about having replies sent to every trigger, but perhaps triggers could keep filtering on event type and source, which would allow you to subscribe to only particular classes of replies if you like. This also makes the on-cluster eventing case better.

It also makes

Harwayne commented 5 years ago

I'm also nervous (like @lionelvillard ) about having replies sent to every trigger, but perhaps triggers could keep filtering on event type and source, which would allow you to subscribe to only particular classes of replies if you like. This also makes the on-cluster eventing case better.

Sorry, I elided that information above. Whenever I say that 'a Trigger receives an event', that Trigger will:

Check its filter. If the event does not pass the filter, then immediately drop the event.
Send the event to its subscriber.
If the subscriber replies to the event, then send that reply to the Trigger's Broker as a broadcast event.

So when I wrote that 'replies sent to every trigger', I do expect every Trigger to apply its filter before sending the event to the subscriber.

duglin commented 5 years ago

If the subscriber replies to the event, then send that reply to the Trigger's Broker as a broadcast event.

Keep in mind that if the user sets this up by only saying something like "Send this github's events to this KnService" and under the covers we route things thru Brokers and Triggers, then I think they would be kind of surprised to even have to think about whether replies from the KnService are sent to any other Triggers/Services in the environment - rather than just being dropped on the floor. If under the covers we choose to leverage our building blocks but do not force the user to know about those building blocks during the setup phase, I don't think we want to force them to be in the position of now doing extra work because we (on their behalf) introduced a shared Broker into the flow when we could have just gone straight from the Importer to the KnService.

mikehelmick commented 5 years ago

As mentioned in the working group meeting, i've added some CLI context to the doc

https://docs.google.com/document/d/1DpiSL2dUcYS2n7yXOIG5LJwyIC1lY9q_W8-56U1SvKM/edit?hl=en#heading=h.vf9gotlemjs8

rhuss commented 5 years ago

* Extending the registry to [optionally] describe how to create an importer for a given source type

Does this mean that all supported importer types need to be known in advance or is it possible to plugin a yet-unknown importer with an arbitrary CR schema later on? (e.g. like a TwitterSource CRD) so that the flat list of options can be translated into a specific nested CR instance ?

In other words, is the number of supporter importers planned to be restricted or is it open to extension ?

mikehelmick commented 5 years ago

@duglin - Yes, replies is something that can be hazy in this model. I think that external event, leading to a reply which triggers another service will be a common pattern. This needs some more design thought. A couple of options that readily come to mind

Rethink replies from a trigger going back onto the broker. This is simply an optimization, any function could open a connection to the broker and send events themselves. Replies just make that easier, but it also (potentially) obscures the origin of the event
Require that these reply types are also in the registry, this would allow us to do something similar to events that require importers, the invoker essentially plays the role of the importer in this case.

There's a difference between say, an external Cloud Storage event and an on-cluster reply. When setting up a trigger to Cloud Storage, an importer needs to take action to get those events to start being send and after that the trigger is just the routing. For the on-cluster reply, just the routing part needs to be established and I think we can get to a sane way to do that.

@rhuss - The registry is meant to be extensible and runtime configurable. Any new EventType could be added to the registry at any time.

duglin commented 5 years ago

Definitely needs more design thought.... one other thing to consider is that the replies will most likely not be CloudEvents. How those get converted into CEs so they can leverage the rest of the eventing infrastructure (for things like filtering) will be an interesting nut to crack.

Also, are you assuming the registry is a required component? I've been assuming it's optional - and, in fact, for the usecases that I've been considering it's not present at all.

mikehelmick commented 5 years ago

The trigger centric model on this issue makes the registry a required component. Other than that, I think it can be treated as optional right now.

We do need to do more on the data plane security and any kind of security policy might make the registry or something like the registry required to enforce policies.

duglin commented 5 years ago

The trigger centric model on this issue makes the registry a required component

I think this goes to a point I made in some gdoc this morning. Whatever we come up might be best placed on a new resource so that we don't make a common component (like trigger) be overloaded with additional semantics or force the inclusion of an optional thing, like the registry.

It's also not clear to me that all of the features of what's being proposed requires the registry. For some of the "give me info about this event producer" it would be, but if the user provides all of the necessary info w/o the need for discovery then the registry might not be needed. Which I believe will be the same for at least some percentage of the usecases where the setup is done via scripts that are developed/tested in advance and have no need for any discovery at all.

rhuss commented 5 years ago

@rhuss - The registry is meant to be extensible and runtime configurable. Any new EventType could be added to the registry at any time.

How would the mapping from flat key=value configuration to a nested CR schema happen when a user creates an instance of an importer type ?

mikehelmick commented 5 years ago

Background

Based on the product concerns raised in this issue, we expanded the CLI scenarios and proposed changes to the user model to match. (see previous comments)

The proposed model changes were actually built and we’ve demonstrated this working end to end. The summary of these changes are as follows:

Enhance the events registry (EventType CR) to specify how to create an importer for that specific type
Make Trigger Addressable
Enhance Trigger so that it can automatically create importers, by using the configuration points as published in the registry.

We learned a lot from constructing this experience and are proposing some alternatives here, but still supporting the same CLI experience.

Motivations

For motivations, we look to the Scenarios for Knative Eventing. It is important to think about how our object model for configuration and our data plane for delivery handle each of these modalities.

FaaS Scenarios

One of the key motivations of the work around the origin of this issue is to support the FaaS scenario and make it possible to write side-effect free triggers. With the current combination of {importer, broker, trigger} it is possible for unintended side effects to occur in the system.

For example Alice and Bob both want to consume finalize events for the Google Cloud Storage (GCS) bucket “foo”

They both write a trigger of source: “gcs” and type “finalize”

Then, they both create GCSImporter to pull those events from the GCS bucket called “foo”

In this case, both importers would receive the same events from GCS, deliver them to the broker, and then both triggers would see the same event twice.

Our previous solution has solved this by strongly tying importers to a specific trigger. Multiple importers could be tied to the same trigger, since the trigger is addressable. Anyone who is able to send a message to the broker, can address a trigger directly through the broker.

This also describes the FaaS scenario, where an individual consumer can be configured to consume an exact type of event from a specific producer.

Events produced for the FaaS scenarios tend to be originated off cluster and need to be configured before they are sent.

Event-Driven Scenario

With this model, a developer is originating their own events in response to external factors (user requests or incoming events from the FaaS case). Events in this class are often generated on cluster and do not need to be configured before they are sent.

The ability to discover, trigger on, and secure these events are important aspects of the system.

This is essentially the Enterprise Service Bus pattern.

Black-box Integration Scenario

As described in Scenarios for Knative Eventing, scenario 3 is a specialization of both scenarios 1 and 2. From a developer’s viewpoint, consuming events is just like scenario 2 (Event-Driven) in that if an event is available on my cluster, I am able to trigger off of it.

The producer of these events is either off-cluster or external to the software that the consumer is writing. In this case the configuration of the producer is much like what needs to take place in scenario 1.

Other Factors

User Experience

Developing UX (CLI and/or UI) over a loose collection of sometimes associated objects is a potentially difficult problem to solve. This issue was raised repeatedly early in the early development of Knative serving.
Consumer code (functions developers write) must not have to know how an event reached it in order to be able to process it.
The user experience for importer authors should be as simple as possible. It is desirable to avoid importer authors having to manage fan-out to all subscribers.

Central Design Question

The main design question we must answer is

Should we solve scenarios 1, 2, 3 with the same configuration model and delivery / consumption experience?

There are other aspects of requirements that we also need to consider, where those haven’t been fully articulated yet.

Data plane security. It must be possible for cluster operators to provide a way to secure data as it moves through the system. Our primary concerns here are authenticity, accuracy, and privacy.
Delivery guarantees. Once an event reaches knative, what are the guarantees (assuming a specific configuration) do we want to provide? Certain design decisions have implicit behaviours within the system and could have either unintended consequences for our users, or worse, they could become dependent on system behavior that we haven’t documented, intentionally chosen, or intent to maintain.

Importer Direct to Consumer as an Example

Knative will not prohibit connecting an importer direct to a consumer. If this is done, there are some aspects of the normal system experience that can’t be guaranteed. One of the requirements that we haven’t formalized is delivery guarantees within Knative.

If we were to choose to make this the default experience, there are some pieces of functionality that we inherit and some extra requirements that get pushed back to importer writers and event producers. We should at least enumerate these issues.

These issues arise when we take into consideration delivery guarantees, queuing / retries, fanout, authentication / authorization from producer to consumer.

Delivery Guarantees : By not formalizing delivery guarantees, any complexity here gets pushed to either the event producer or importer author. A producer must manage successful delivery either upstream or through the importer.
Queuing / Retries : A direct connection through the importer causes pushback to be handled at an importer or producer level. If a service fails to scale fast enough or has an upper limit, this request events to be queued and retried over a period of time.
Fanout: Direct connection requires an event producer to manage fanout. This can be costly to some event producers, and is extra complexity that can be pushed into lower levels of the stack.
Authn/authz : Direct connection makes it more difficult to construct a policy enforcement system for securing events and ensuring privacy.
Tracing / Monitoring: Centralized delivery infrastructure will lead to commonality of tracing, leading to a more debuggable system.

Discovery

So far in working on this issue, the importance of event discovery (Source & Type) to the user experience has been highlighted and must be supported. The current registry implementation is event type centric (the CRD is literally called EventType). This works really well for scenario 2 (on-cluster, event-driven scenario) and scenario 3 (black-box integration).

This works less well for scenario 1 (FaaS) because it makes discovery of events via source more difficult to build. This doesn’t mean that the existing registry is wrong or less useful.

We should consider if the concept of the events registry can solve both purposes.

This has been broken out into issue #1550

Data Plane Requirements

Our current focus has been on the developer experience, but we haven't spent as much time formalizing data plane requirements. Here, we propose some data plane requirements for Knative/eventing.

An importer must not ACK any given message from a producer until that message has been acknowledged by the sink of the importer. This is a signal to a producer that the event will be routed, and that the producer doesn’t need to be concerned with retries.
A message that receives an ACK after delivery to the broker takes on the durability properties of the broker’s configuration. Assuming a durable backing channel, an event that successfully reaches the broker must be delivered to any trigger that was known at the time of event receipt and may be delivered to new triggers that are discovered before routing of that event has been completed.
- It must be possible for an operator to config the backing transport that the broker uses in order to get the desired behavior. Any durability guarantees offered by the underlying transport, must persist across restarts.
An operator must be able to configure ingress and egress policies in accordance with the security requirements outlined in issue 705. Only authorized senders of certain event types may ingress events of those types. This is done to ensure the integrity of those events.
The execution timeout provided by the broker must be clearly understood by the operator and developer. Given our two production quality backing channels, we currently have these bounds.
- Cloud PubSub Channel - maximum request execution of 10 minutes. This is is a mismatch from the maximum knative/serving service execution timeout. A message that isn’t ACKed back to the pubsub subscription will be retried.
- Using Kafka, the max.poll.interval.ms value can be set to Java’s MAX_INTEGER which allows for 24.8 days maximum execution before timeout and retry.
- This particular area needs more thought and definition around what guarantees should be provided.

Importer spec has been split off to #1554 and Data plane / broker Improvements have been split off to #1555

Solving 1381

During the development of the prototype of the previously stated direction, there were a few issues that were raised and we want to address those before moving forward.

Issues Discovered

Untyped APIs / Duplicated Definitions

An importer already defines everything that is needed to configure a producer to bring events into a cluster. As previously proposed, the parameters of the importer are redefined in the event registry, where the EventType object tells a user (or program) which importer to use, and the names and descriptions of those types. This creates two issues:

Correct synchronization is required. Changes to the importer CRD must be also copied into the corresponding EventType entries.
We lose type safety which impacts external tooling (that knows how to interpret kubernetes type definitions) and the built in synchronous validation.

Producer / Importer Level Fanout

The previously prosed solution to the FaaS scenario does one of two things. It either pushes fanout concerns back to the producer or complicates authoring importers.

The knative/eventing object model, prior to issue 1381 solved this problem by decoupling producers and consumers (through the broker) and handling fanout within the implementation of the broker.

Because of design and implementation decisions (current implementation of GCSImpoter for example, using the trigger object as modified for our demo) two functions requesting finalize events on the same GCS bucket, require GCS to send that same event twice.

Given the current design, even if we modify the GCSImpoter to dedupe these requests, allowing GCS to only emit the event once, fanout is only pushed down one level, to the importer. Since each trigger is addressable, and any event entering the broker could only be sent to a single trigger, this means that an importer couldn’t acknowledge an event until the broker has accepted all possible fanouts.

Since the broker and our transport infrastructure (i.e. channels) supports fanout, we should aim to leave that fanout implementation as the one that is used.

Recommended Path Forward

Congratulations, you’ve made it this far, let’s talk about how to move forward.

Registry Changes

The registry is important in that event and source discovery is a valuable use case for our customers.

There is currently one entry in the registry per event type, this is actually a tuple of Source and Type. We will introduce a new importer registry. See #1550

This gives the overall registry 2 components working together to facilitate discovery and configuration of events. The importer registry will be responsible for informing a user of which producers they may solicit what events from and how to configure those importers.

When an importer is configured, the importer is responsible for populating the EventType appropriately. As a concrete example, the importer registry tells me that I can get Google Cloud Storage “finalize” events by instantiating the GCSSource kind. If two GCSSource objects are created for finalize events, the finalize event type should only appear in the EventType registry once. The EventType should reflect which importers are providing that type (can be an annotations, status, etc. to be designed later).

This provides better type safety, and solves the synchronization problem described previously.

Importer Spec

The primary design point here is introducing the concept of importers identifying themselves. We introduce a CloudEvents extension attribute called sourceid that allows for events to be filtered based on this attribute if present. This is needed to disambiguate between importers of events that might be connected to the same upstream producer.

The convention here is that a well-behaved importer will populate the sourceid attribute with the metadata.uid value for the object that a trigger writer would select on. This could be the importer itself, or could be a knative service. It is important to note that objects may need to coordinate.

Our GCSSource actually creates a Cloud PubSub importer to bring the events into the broker. The PubSub importer should use the uid of the GCSSource since that is what the user will be configuring against.

This introduces a new data plane contract for importers. When going through an event mesh, this is unavoidable. This solution makes it very important for an importer author, but it is also optional. Not all importers / event producers will have to implement the sourceid contract, but failing to do so, prevents the FaaS scenario.

The importer specification has been split off to #1554

Broker + Trigger Changes

The selector option will be added to the TriggerFilter portion of the Trigger spec. The selector relates back to the metadata.uid of the object being selected against.

If a trigger contains a selector filter, this filter will be applied with exact match before delivery. The selector is an ObjectReference and will cause the trigger to have an exact match filter on the UID of the referenced object in the sourceid attribute.

When using the selector filter, we can update the status of the trigger with a failed condition if the expected importer / event source is not present and ready.

See issues #1554 and #1555 for discussion and resolution of these issues.

CLI Changes / Leverage Idempotency and the Kubernetes Resource Model

One of the points of our previous iteration was to utilize a user’s interaction with a single API object (Trigger) in order to configure the entire end-to-end flow: importer through broker and the trigger to the consumer. This helps consolidate failure modes to on-cluster controllers, rather than with client side orchestration.

This design decision was also made to avoid issues with ensuring the eventing configuration was fully declarative.

We are able to get the desired CLI experience here when the CLI makes 2 API calls to persist two objects. This works because the two calls are idempotent, and can be repeated (as long as the same configuration is used) in the event that the creation of the second object fails. This implies naming conventions that will need to be documented for well behaved clients.

Here, we walk through the CLI commands and describe what will happen.

kn events triggers create trigger-name \
  --cluster=gkecluster \
  --namespace default \
  --type com.google.gcs.object.finalize \
  --bucketId myBucket \
  --service-account-secret secret/gcskey \
  --target-service file-function

When this command is executed, the client creates a trigger and populates the selector with an object reference back to the importer (which isn’t yet created).

Next, the client creates an importer (with an OwnerReference to the trigger). Upon creation, the trigger's reference will now exist, and the Selectable contract will be fulfilled. The client adds an owner reference on the importer. This enables automatic deletion of an importer when the associated trigger is deleted.

The client can list triggers. When describing a particular trigger, the client can add information from the importer in the object reference.

Summary

Overall, in Knative, this solves all three target scenarios with the same object model.

FaaS Scenario

Solved by allowing triggers to select events from a single importer. Note that anything can implement this dataplane. We should consider enhancing knative service to surface the uid of the service into revisions from that Service.

Event-Driven Scenario

Custom, on-cluster events, can choose to implement the sourceid contract, or choose not to. In that case, triggers can be written using the existing sourceAndType filters.

Black-box integration scenario

Applies the same as the event-driven scenario. Trigger authors can select any events available in the broker. Again, integrators are able to implement the Selectable interface to enable the FaaS scenario over any type.

Broker Improvements Needed

The current implementation of the broker uses a single channel from ingress to delivery and the fanout is managed on that channel. This design is not scalable in terms of throughput. Redesigning the broker is out of scope for the document, but successful execution of this plan depends on it.

See #1555

Alternatives Considered

Go Back to Pre-1381 solutions

This is not a viable alternative, as without modification, the core issues raised in 1381 and user scenarios can’t meet the requirements. In particular, triggers aren’t “side-effect free.”

Continue with Enhanced Triggers

This is the previous proposal on this object and the issues with this approach have already been detailed in this comment.

Introduce an actual 'Storm' object to the API surface

This is similar to what we did with the Service object on Knative/serving. A lot of the same issues that have been discussed thus far in this document also apply to some designs for a storm object. In our current prototype, trigger is the storm object. It contains everything needed to instantiate the other objects in the system.

The other problem we were solving with Service was the “bounding box” problem identified through user research. Having a single thing to list is important to a user’s understanding of the system, and ability of CLI and UI authors to present information to users in a coherent manner.

Introduction of a storm object would likely cause future divergence between the 3 scenarios highlighted in the issue. In particular, it could cause drift between scenario 1 and scenarios 2 and 3. For on cluster events and black-box events, we envision usage of a broker.

While it is not an explicit requirement that the experience be aligned between the 3 scenarios. We believe that it is in the best interest of our users to make the experience as similar as possible. In particular, it will be an advantage for UX builders to have a single primary object to interact with.

Adding a ‘Selectable’ Duck Type

In order to determine what value to select on, we considered introducing a new duck type for importers (or anything else that sends events through a Broker) called Selectable. The status contract of selectable, indicates the value that will be present in the sourceid attribute. Resources that emit events and do not satisfy this contract cannot be assumed to be selectable. The sourceid should be unique (to avoid collisions), and it may make sense for the sourceid will be the kubernetes object’s uid in many cases. In other cases, it make be some other unique value. An importer that implements selectable guarantees that the sourceid attribute will be populated with this value.

This allows triggers to reference an importer by object reference, without having to know the exact sourceid to select on, this can be synthesized by the broker infrastructure.

The reason that this alternative wasn’t favored over a simple ObjectRef is because there are other types that might originate events (kubernetes service for example) that are able to implement the data plane contract, but wouldn’t be able to satisfy the control plane duck type contract.

sixolet commented 5 years ago

Thing to consider: Ready state of Trigger

Before the Importer is created (should be Unknown)
If the importer goes away (should be False)

sixolet commented 5 years ago

The gitops view of this might lack a little in terms of ergonomics — you have to create two objects to get your one concept. This might be the best we can do, though.

mikehelmick commented 5 years ago

The gitops view of this might lack a little in terms of ergonomics — you have to create two objects to get your one concept. This might be the best we can do, though.

I think the gitops scenario is ok, because everything here can be fully declarative, and we don't need output from one object to go back and update the other in the client side.

I think the same can be said for other pieces of using kubernetes more directly like putting a service in front of a deployment.

duglin commented 5 years ago

The proposed model changes were actually built and we’ve demonstrated this working end to end

Where? I'd like to play with it

duglin commented 5 years ago

In this case, both importers would receive the same events from GCS, deliver them to the broker, and then both triggers would see the same event twice.

This actually isn't a 'side effect' or a bad thing. This is exactly what the user specified should happen. And it's worth pointing out that they are not the same events from a CloudEvents perspective. They will have different ids. Additionally, I would think GCS sees them as different events too since each one is associated with a different user. Which means it's very possible that each user might see a different set of events based on their authz. Which means it might not be appropriate to assume that if ANY subscription to a producer exists, that the events from it satisfy all possible users asking for events from that producer.

Events produced for the FaaS scenarios tend to be originated off cluster and need to be configured before they are sent.

Not sure what you mean by "configured before they are sent". Can you elaborate?

This works less well for scenario 1 (FaaS) because it makes discovery of events via source more difficult to build

Can you elaborate on why this is true? It seems to me that even in the on-cluster case you may not know what events are going to be generated unless someone goes thru the task of adding them to the registry - same as the FaaS scenario.

sourceId

I would call it importerId since it is not the source of the event and will confuse people. Also, it's not clear what this value is used for - aside as an FYI of who the importer is. What do we expect people to do with this info? You talk about disambiguating importers but it's not clear to me why.

Later on you talk about using it for a filter... but do we really expect users to specify this value? Or do we even expect people to know about the importers directly since I think this proposal is trying to hide them from people.

The convention here is that a well-behaved importer will populate the sourceid attribute with the metadata.uid value for the object that a trigger writer would select on.

Setting it to the uuid of the importer makes sense, but I didn't follow what "the object that a trigger writer would select on" means.

In the sample flow you talk about the CLI creating two objects (the trigger and then the importer). Which conceptually makes sense, but I'm wondering why all CLIs are expected to do this. If all of the information is available in the registry, why not have this be done server side?

It's not clear to me that the 3 scenarios are different. In all cases there are events coming into Knative and they need to be sent someplace, possibly with some filtering. But I feel like I'm missing something.

duglin commented 5 years ago

--type com.google.gcs.object.finalize

@n3wscott I believe you would claim this is incorrect and should be knative namespaces, right? This is a pretty big decision point. I'll open an issue to discuss.

grantr commented 5 years ago

Where? I'd like to play with it

Instructions here: https://github.com/nachocano/eventing/blob/wat-demo/DEMO.md

mikehelmick commented 5 years ago

In this case, both importers would receive the same events from GCS, deliver them to the broker, and then both triggers would see the same event twice.

This actually isn't a 'side effect' or a bad thing. This is exactly what the user specified should happen. And it's worth pointing out that they are not the same events from a CloudEvents perspective. They will have different ids. Additionally, I would think GCS sees them as different events too since each one is associated with a different user. Which means it's very possible that each user might see a different set of events based on their authz. Which means it might not be appropriate to assume that if ANY subscription to a producer exists, that the events from it satisfy all possible users asking for events from that producer.

The duplication here would be if essentially the same importer was configured twice, covering a well written filter to match both. The requirement from @lindydonna was to allow for binding a trigger to a specific importer, but not prohibit the existing behavior.

I agree that in general what might be seen as "over-subscription" isn't in itself a bad thing as it leads to more flexible systems.

Events produced for the FaaS scenarios tend to be originated off cluster and need to be configured before they are sent.

Not sure what you mean by "configured before they are sent". Can you elaborate?

Using Google Cloud Storage as an example, which by default bucket operations to not generate events. A user must configure a bucket to send those events.

This works less well for scenario 1 (FaaS) because it makes discovery of events via source more difficult to build

Can you elaborate on why this is true? It seems to me that even in the on-cluster case you may not know what events are going to be generated unless someone goes thru the task of adding them to the registry - same as the FaaS scenario.

Thinking of scenario 2 (message bus setup), and using the example of a knative service for new user registration, that produces com.example.registration events. I agree that these should also be added to the registry, by creating an EventType record, otherwise discovery is limited.

The difference is related to the previous point, with off cluster events needing external configuration before events can enter the broker.

The current registry is based on event type and one thing that we've found in talking with users is users think of FaaS scenarios as event source first. As in, I want events from GitHub, ok, what event types are there? Not starting with I want pull requests, where can I get those from?

sourceId

I would call it importerId since it is not the source of the event and will confuse people. Also, it's not clear what this value is used for - aside as an FYI of who the importer is. What do we expect people to do with this info? You talk about disambiguating importers but it's not clear to me why.

Fine w/ importerid as name.

This is to drive the FaaS scenario to allow a trigger to be written against a specific importer in case there are multiple importers that happen to be bringing "duplicate" events on to the cluster.

Later on you talk about using it for a filter... but do we really expect users to specify this value? Or do we even expect people to know about the importers directly since I think this proposal is trying to hide them from people.

See my reply here: https://github.com/knative/eventing/issues/1554#issuecomment-516525933

I would expect a CLI to be something like the examples where the tooling would create the link between the objects.

The convention here is that a well-behaved importer will populate the sourceid attribute with the metadata.uid value for the object that a trigger writer would select on.

Setting it to the uuid of the importer makes sense, but I didn't follow what "the object that a trigger writer would select on" means.

See example yaml here

In the sample flow you talk about the CLI creating two objects (the trigger and then the importer). Which conceptually makes sense, but I'm wondering why all CLIs are expected to do this. If all of the information is available in the registry, why not have this be done server side?

Our first prototype did this, @grantr provided a link above.

One of the concerns raised here was that solution relied on multiple versions of the config (CRD + registry) of an importer. Having an untyped map would make it so that this didn't integrate well with existing k8s tooling.

It's not clear to me that the 3 scenarios are different. In all cases there are events coming into Knative and they need to be sent someplace, possibly with some filtering. But I feel like I'm missing something.

From a consumption side 2 and 3 are exactly identical. 1 (as defined by @lindydonna ) is different in that the user is creating the trigger in the context of an external system where those events need to be configured for sending.

duglin commented 5 years ago

Am I correct to say ?

this is trying to allow people to specify a Trigger for a Broker w/o any knowledge of the event source (what they can do today)
this is trying to allow people to specify the creation of a new event source at the same time as their Trigger definition
this is trying to allow people to specify an existing event source of interest at the same time as their Trigger definition

mikehelmick commented 5 years ago

Am I correct to say ?

this is trying to allow people to specify a Trigger for a Broker w/o any knowledge of the event source (what they can do today)

Yes - this is scenarios 2 and 3

this is trying to allow people to specify the creation of a new event source at the same time as their Trigger definition

Yes, and to bind to that event source only, scenario 1

this is trying to allow people to specify an existing event source of interest at the same time as their Trigger definition

Wasn't specifically trying to solve that, but yes this can also be done by binding to an existing source by objectRef. A well behaved clients should add an additional owner reference to that importer.

grantr commented 5 years ago

We introduce a CloudEvents extension attribute called sourceid that allows for events to be filtered based on this attribute if present.

I propose using "producer" for this concept instead of "source" or "importer". See the CloudEvents definition of producer:

The "producer" is a specific instance, process or device that creates the data structure describing the CloudEvent.

I think that definition fits the filtering concept being introduced better than importer or source. I like that it refers to the producing component immediately upstream of the consumer, regardless of whether it's the originator of the event or not.

mikehelmick commented 5 years ago

/milestone v0.8.0

was originally approved for 0.8, never tagged.

grantr commented 5 years ago

The convention here is that a well-behaved importer will populate the sourceid attribute with the metadata.uid value for the object that a trigger writer would select on.

I think the value should actually be a qualified object name, e.g. gcppubsubsources.sources.eventing.knative.dev/<namespace>/<name>. This makes implementation easier (the value of the attribute is deterministic from the ObjectReference value) and eliminates temporal gotchas.

If the importer is deleted and replaced, then the uid will be different even if the name is the same. Now the trigger can only receive events from the previous importer or the new importer but not both. I doubt this is what the user wants or expects.

mikehelmick commented 5 years ago

On Addendum that came out of discussions with @rgregg

The three use cases that we've identified, are all supported by a common solution (as already proposed).

This design is also useful in supporting graduated complexity. It is possible for a system to start using the FaaS scenario and graduate to the entirely event-driven scenario. In order to avoid some of the duplicate delivery problems, this requires some discipline from the system operator to ensure proper configuration.

knative / eventing

Make it easier to consume events directly from specific sources #1381

Background

Motivations

FaaS Scenarios

Event-Driven Scenario

Black-box Integration Scenario

Other Factors

User Experience

Central Design Question

Importer Direct to Consumer as an Example

Discovery

Data Plane Requirements

Solving 1381

Issues Discovered

Untyped APIs / Duplicated Definitions

Producer / Importer Level Fanout

Recommended Path Forward

Registry Changes

Importer Spec

Broker + Trigger Changes

CLI Changes / Leverage Idempotency and the Kubernetes Resource Model

Summary

FaaS Scenario

Event-Driven Scenario

Black-box integration scenario

Broker Improvements Needed

Alternatives Considered

Go Back to Pre-1381 solutions

Continue with Enhanced Triggers

Introduce an actual 'Storm' object to the API surface

Adding a ‘Selectable’ Duck Type