Closed chriszs closed 4 years ago
I think a good way to try to answer this question is to attempt to refactor them as appliances and learn more about the pain points for both appliances and implementing an ingestion engine as one.
Agreed -- for now (as in, the next week or two) I think moving forward with the current model is the way to go (they work as they currently stand), but this does feel like a potentially natural mid-term progression plan.
Yeah I'm just worried by then we'll be locked further in with a bunch of ingestion engines and appliances, not to mention implementation repos.
Like I want to add flow monitoring, which would make my life substantially easier, but the way I'd do that depends on this design decision.
Got it.
I'll explore a bit below:
Ingestion engine could take in a new type of payload (idk, something like JSON
) which would be the configuration data for that ingestion engine. That would specify the details of the stream it's creating according to a TBD spec.
The engine appliance would take in that config payload once, and then invoke
(once) to start the stream.
It would emit payloads the same way any other appliance would emit.
It would have an audit
the specifies dependencies like any other appliance.
The countertop would handle the kafka interaction the same way as any other appliance.
We had talked about the countertop coordinator doing some kind of magic to detect when new streams are added and spin up copies of appliances so there was a 1:1 relationship between a stream and a given appliance. This was trivial when stream creation occurred outside of appliances.
QUESTION: How would the countertop know NOT to spin up multiple copies of the ingestion appliance.
ANSWER: We could still have the countertop detect new streams (payloads will be decorated with a stream name, for instance), but that decoration would be added BY the ingestion engine appliance, and therefore it would NOT ever appear on the payload type that the ingestion appliance consumes.
In short, the countertop's appliance clone logic should be as follows:
Since the payloads that are consumed by ingestion engines will never have stream decorators (rather, stream info will be inside the OBJECT.CONFIG
payload's data
attribute and used by the appliance to decorate the output), we won't risk duplicating ingestion engine appliances.
Most appliances are invoke
d many times as new payloads arrive, ingestion engines would be invoked once (or maybe not at all, as we explore later in this thread).
Question(s): What happens if the stream stops? Should it re-start? Should it throw an error (which we do already have in appliances)?
Answer First: how should the countertop coordinator handle errors from appliances? If the coordinator generally runs teardown
and startup
appliances whenever the appliance emits an error then we are in OK shape here, because it would re-start the stream in the event of a stream error.
Well, would ingestion engines even consume payloads? Considering they originate them.
I was thinking they would kicked off with a single config payload but you're right we could have invoke
called directly with an empty PayloadBuffer (we could even have this always happen for every single appliance regardless of type as part of Appliance registration; or we could just put it on the developer to invoke their ingestion appliances to kick them off)
Or it could just start ingestion on setup
(should just think about whether having setup
begin the production of data would be unexpected.)
That kind of design question for appliances is what I was hoping to deal with when appliances are still potentially changeable.
OK I'm pretty convinced at this point that ingestion engines should be appliances and we should do the following:
AbstractIngestionEngine
over to AbstractIngestionAppliance
This will involve ditching AbstractIngestionEngine
.ingestPayload
since that will be handled by the countertop.
Almost everything else about it will stay the same, just some basic renaming of things like start
=> startup
and stop
=> teardown
and adding an audit
.
appliances
repo.The rest of these are just to write musings.
I think we should consider changing the appliance ready
event to started
which is a subtle shift but it indicates that a payload
event could be created at any point after started
is emitted, (but since we published READY we might not actually want to do that as it would be a breaking change).
This fits with the direction we have been moving, which is that really this tv-kitchen
repository is just countertop
. It should have an API that an implementation would interact with, to essentially register appliances and to register event handlers.
Great. Now, are appliances cake?
Closing this now, since indeed ingestion engines are actual appliances!
Discussion
What do you want to talk about?
Ingestion engines are located in the TV Kitchen repo right now, but they might be better factored as appliances, given they can come with OS level dependencies and we want to make it easy for other people to add their own. I haven't spent any time with appliances to know, but if appliances have inputs and outputs, one could imagine they're special cases, where the input is unmanaged and external. In part, I'm wondering this because it's become clear to me we'll need flow monitoring and retry logic for these, and it occurs to me that might be better factored outside the engine itself and applied universally.