Create the abstract ingestion engine

slifty commented 4 years ago

Task

Description The architecture outlined in #1 talks about the idea of a "ingestion engines" which will be responsible for converting external media sources into a standardized video stream for internal use.

We anticipate a few types of inputs over time, so we should create an abstract class that defines the general shape of what an "Ingestion Engine" is responsible for.

Location of this TBD but maybe put it in a new directory like src/ingestionEngines/?

Relevant resources / research

Issue #1 has some architecture diagrams (which will eventually be put into the documentation repository)

Related Issues

14 Is an issue for the exploration of ffmpeg and stream formats that we might want to use

reefdog commented 4 years ago

Dropping in quick notes following a call w/@slifty.

Hoping to use ffmpeg directly w/no wrapper library (just calling it directly), at least for MVP or until we realize why it's a bad idea.
After doing whatever transformation (#14), the input engine will output a stream and then notify the sous chef of the new stream location.
With this architecture, new ingestions can come online and the sous chef notified without being preconfigured.
Stream will likely be a network stream to allow ingester and sous chef to be on separate machines, but will also explore a filesystem stream once we understand what that actually means.
Not talked about on the phone but I was just thinking about it: obvs the ingester should be able to have new inputs added without stopping/starting any currently-running ingestions.

reefdog commented 4 years ago

What does an individual ingester do?

In dense summary, an ingester (or "ingest engine" if we prefer) receives a source string (file path, network address, YouTube URL, etc.), identifies the source type, probes the source for composition details, chooses from the available A/V streams (if multiple), decides if we need to normalize/transcode them, decorates it with metadata, packages it for streaming, reserves an output address, notifies the countertop coordinator, and begins streaming.

Example time!

Let's assume we have an RTP stream like rtp://192.168.1.100:5000. To begin ingesting it, we run a command like yarn ingest:add rtp://192.168.1.100:5000.

1. Ingest

Determine the source type. Is this a local file? Network stream? YouTube link? (In our case, a network stream.)
Verify that we know how to deal with this source type.
Probe the source for its composition. If there are multiple A/V streams, determine and select the best one here. Is there a caption stream?
Decide on normalizing/transcoding. Ideally, the entire system should work with a variety of codecs (tbd in #14). But here is where we would decide if the available streams can be packaged for output as-is, or need transcoding.

2. Prepare for output

Reserve an available output address from the configured pool. If there are none available, exit 🏃‍♂️🐻.
Notify the Countertop Coordinator that a new stream is coming online, and provide the address.
Possible: Await confirmation from the Countertop Coordinator. If confirmation times out, exit 🏃‍♂️🐻. (Alternatively, we could just start streaming even if we don't have confirmation that the countertop is listening. This risks streaming unnecessarily, but in combination with the "periodically notify the countertop of current streams" behavior below, might be the better model.)
If confirmation arrives, package the stream for output. This is where we take the choices above and actually compose them into a stream command, as well as decorate with any metadata like timecoding.

3. Start streaming!

🌊

4. Respond to end-of-input

If input ends (end of file, connection closed), finish streaming the current buffer, and then die.

Q: Should Ingester notify Coordinator, or should Coordinator have its own version of this behavior?
Q: What should we do if the input stream times out? (I.e., received no official end-of-file / connection closed message, but haven't received data in a while?)

Do we need an Ingestion Coordinator?

Most of the above can be handled by the individual ingesters themselves. However, a few things may be nice to have at a higher / meta level.

Periodically notify the Countertop Coordinator of open streams. Once an ingestor notifies the Countertop Coordinator that a stream is available, there is no further communication from the countertop. It might be good to periodically let the Countertop Coordinator know "hey, just FYI, we're still streaming [this list of streams] to you", with the idea being if the Countertop has somehow dropped some of them, that's the Countertop Coordinator's chance to pick them back up. TBD is whether the Countertop Coordinator can respond "not listening and don't care" and let the ingester shut the stream down — I say no, but mentioning it for discussion. In any case if we do this, it would be an Ingestion Coordinator, not any individual ingester, who should communicate.

Okay that's all I really have right now for an Ingestion Coordinator.

slifty commented 4 years ago

This is a great exploration! Some thoughts / feedback:

I think there may be merit to designing the relationship between ingestion coordinator and countertop so that it is fully decoupled.

Put another way, ideally the IC wouldn't know the countertop exists to begin with / they wouldn't require a back and forth.
I see a key role of the ingestion coordinator as figuring out which "ingestion engine" to invoke for a given input stream.
I'm trying to think of a use case for a situation where a configured stream would be mounted and not consumed. I think that it might be best for the IC to assume that if the stream exists and it has been told to ingest it that the consumers should be assumed to be there / not worry about if they stop.

Likewise, countertop should probably watch for new streams and assume that their existence implies the command to consume it.

Does that all make sense?

An open question is "what is the best way to convey to the countertop that a stream is ready to process" -- this might be best done via a kakfa queue (or pair of queues: one to indicate closed streams and one to indicate open ones)

reefdog commented 4 years ago

A whole lot of the above just changed! Mostly, numbers 2–4.

The ingestion engines will still receive input (file, UDP stream, etc.), verify their composition, make re-encoding decisions, package the stream metadata, etc. But rather than make a network stream available for the countertop, we're exploring blindly streaming the data into Kafka for consumption by the countertop (either by the sous chef or, potentially, directly by the line cooks if it makes enough sense for the ingester to essentially be an appliance).

In some sandbox scripts, we were able to successfully stream a local video file in chunks, create Kafka messages with the chunk data (each ~70-100K, well under the 1MB default/recommended Kafka message limit) on a specific topic, and have a consumer that listened to that topic and reassembled the chunks into an identical file. (Actually, we are still working to ensure the chunks can be reassembled in order; we suspect the message producer is adding them to the queue out of order, probably because that's currently an async process.)

Still to explore:

For the demo, we streamed the data directly off the filesystem with Node; normally, we'll be routing through FFmpeg first for demuxing and stream composition exploration (to make transcoding decisions, etc.). So we haven't yet proven our FFmpeg → Node → Kafka pipeline.
What's the exact composition of the payload that the ingestion engine outputs?
Are ingestion engines going to actually be appliances that can fit right into processing topologies?

Couple of useful links:

Anything I missed / mischaracterized, @slifty?

slifty commented 4 years ago

This captured it I think!

We had another solid working session today which started diving into our ffmpeg integration. There are a series of open questions that I think are safe to call "optimization" level questions (e.g. the best wrapper format, do we actually wanna normalize the codec, etc), but we can punt on that a bit for the short term.

We now have a draft of the abstract ingestion engine as well as a FileIngestionEngine implementation.

The abstract engine handles setting up ffmpeg, piping data to and from it, and writing packets to kafka.

The big question remaining is around the structure of the payload that is sent to kafka, including how we handle the concept of time (do we calculate the duration of a payload? Do we track the total duration so far, etc etc)?

It may well be that the ingestion engines can either track time via data duration OR choose to function in "real time".

slifty commented 4 years ago

One of the big questions we're facing (somewhat related to #42) is how to keep track of the duration of the video we've ingested so far. I asked this StackOverflow question with the hopes to glean some wisdom from the crowd.

tvkitchen / countertop