tvkitchen / meta

A meta repository for discussions and tasks that span the project
Apache License 2.0
1 stars 0 forks source link

Are ingestion engines appliances? #14

Closed chriszs closed 3 years ago

chriszs commented 4 years ago

Discussion

What do you want to talk about?

Ingestion engines are located in the TV Kitchen repo right now, but they might be better factored as appliances, given they can come with OS level dependencies and we want to make it easy for other people to add their own. I haven't spent any time with appliances to know, but if appliances have inputs and outputs, one could imagine they're special cases, where the input is unmanaged and external. In part, I'm wondering this because it's become clear to me we'll need flow monitoring and retry logic for these, and it occurs to me that might be better factored outside the engine itself and applied universally.

chriszs commented 4 years ago

I think a good way to try to answer this question is to attempt to refactor them as appliances and learn more about the pain points for both appliances and implementing an ingestion engine as one.

slifty commented 4 years ago

Agreed -- for now (as in, the next week or two) I think moving forward with the current model is the way to go (they work as they currently stand), but this does feel like a potentially natural mid-term progression plan.

chriszs commented 4 years ago

Yeah I'm just worried by then we'll be locked further in with a bunch of ingestion engines and appliances, not to mention implementation repos.

chriszs commented 4 years ago

Like I want to add flow monitoring, which would make my life substantially easier, but the way I'd do that depends on this design decision.

slifty commented 4 years ago

Got it.

I'll explore a bit below:

How Ingestion Engines are the same as Appliances

Ingestion engine could take in a new type of payload (idk, something like JSON) which would be the configuration data for that ingestion engine. That would specify the details of the stream it's creating according to a TBD spec.

The engine appliance would take in that config payload once, and then invoke (once) to start the stream.

It would emit payloads the same way any other appliance would emit.

It would have an audit the specifies dependencies like any other appliance.

The countertop would handle the kafka interaction the same way as any other appliance.

How ingestion engines are special

We don't want the countertop coordinator to clone ingestion appliances automatically

We had talked about the countertop coordinator doing some kind of magic to detect when new streams are added and spin up copies of appliances so there was a 1:1 relationship between a stream and a given appliance. This was trivial when stream creation occurred outside of appliances.

QUESTION: How would the countertop know NOT to spin up multiple copies of the ingestion appliance.

ANSWER: We could still have the countertop detect new streams (payloads will be decorated with a stream name, for instance), but that decoration would be added BY the ingestion engine appliance, and therefore it would NOT ever appear on the payload type that the ingestion appliance consumes.

In short, the countertop's appliance clone logic should be as follows:

  1. Detect a payload
  2. Check the stream and type of the payload
  3. Get the set of distinct appliances that CONSUME THAT TYPE.
  4. If there is not already an appliance for that stream for each of the appliance types found in step 3, create the new appliance instances.

Since the payloads that are consumed by ingestion engines will never have stream decorators (rather, stream info will be inside the OBJECT.CONFIG payload's data attribute and used by the appliance to decorate the output), we won't risk duplicating ingestion engine appliances.

They are "constantly" collecting data

Most appliances are invoked many times as new payloads arrive, ingestion engines would be invoked once (or maybe not at all, as we explore later in this thread).

Question(s): What happens if the stream stops? Should it re-start? Should it throw an error (which we do already have in appliances)?

Answer First: how should the countertop coordinator handle errors from appliances? If the coordinator generally runs teardown and startup appliances whenever the appliance emits an error then we are in OK shape here, because it would re-start the stream in the event of a stream error.

chriszs commented 4 years ago

Well, would ingestion engines even consume payloads? Considering they originate them.

slifty commented 4 years ago

I was thinking they would kicked off with a single config payload but you're right we could have invoke called directly with an empty PayloadBuffer (we could even have this always happen for every single appliance regardless of type as part of Appliance registration; or we could just put it on the developer to invoke their ingestion appliances to kick them off)

Or it could just start ingestion on setup (should just think about whether having setup begin the production of data would be unexpected.)

chriszs commented 4 years ago

That kind of design question for appliances is what I was hoping to deal with when appliances are still potentially changeable.

slifty commented 4 years ago

OK I'm pretty convinced at this point that ingestion engines should be appliances and we should do the following:

  1. Port AbstractIngestionEngine over to AbstractIngestionAppliance

This will involve ditching AbstractIngestionEngine.ingestPayload since that will be handled by the countertop. Almost everything else about it will stay the same, just some basic renaming of things like start => startup and stop => teardown and adding an audit.

  1. Move the ingestion engine implementations to their own appliance packages in the appliances repo.

The rest of these are just to write musings.

  1. I think we should consider changing the appliance ready event to started which is a subtle shift but it indicates that a payload event could be created at any point after started is emitted, (but since we published READY we might not actually want to do that as it would be a breaking change).

  2. This fits with the direction we have been moving, which is that really this tv-kitchen repository is just countertop. It should have an API that an implementation would interact with, to essentially register appliances and to register event handlers.

chriszs commented 4 years ago

Great. Now, are appliances cake?

slifty commented 3 years ago

Closing this now, since indeed ingestion engines are actual appliances!