tvkitchen / countertop

The entry point for developers who want to set up a TV Kitchen.
https://tv.kitchen
GNU Lesser General Public License v3.0
6 stars 2 forks source link

Where should Payload timestamps be generated? #122

Closed slifty closed 3 years ago

slifty commented 3 years ago

Discussion

What do you want to talk about?

Let's talk about the Payload.timestamp attribute.

In particular, how / where should it get populated?

What is timestamp?

It is documented to be The ISO 8601 timestamp that specifies when, in absolute terms, the Payload's data starts.

This means that the timestamp of a payload is supposed to represent the actual, context-based, point in time that payload is associated with. This is different from position -- which is defined as an offset relative to the start of the stream. It is also different from createdAt which is defined as the time that the payload was created.

Indeed, if I process a pre-recorded television program that was aired originally at 12:30 AM EST on January 14th, 2021 the timestamp of each payload should indicate exactly what time on January 14th, 2021 that the payload's original content was broadcast.

This is a long way of saying that timestamp is determined by position + a-currently-unspecified-origin-time. The origin time is what would lock in the temporal context of the content.

So how should we add timestamp to a payload?

slifty commented 3 years ago

Where to populate timestamp

OPTION 1: Appliances populate timestamp

We could make it the responsibility of the Appliance itself to populate timestamp. Appliances are already responsible for creating the Payload in the first place and determining payload position.

The problem with this idea is that properly calculating timestamp requires stream-level context (the stream's configured "start" or "origin" time -- name TBD). Stream context is explicitly the domain of the Countertop and that alone is a pretty strong architectural argument for why timestamp should probably be populated somewhere in the Countertop.

Lets say we decided to have make that origin time context available to Appliances on initialization. This would lead to another set of concerns:

  1. All appliances will have to implement the same logic that combines duration + origin context.
  2. It would be very easy for an appliance author to incorrectly perform that calculation. This would lead to timestamp + duration pairs that are not consistent with the rest of the stream (e.g. duration 50 in a given stream should ALWAYS have the same timestamp).

Option 2: Countertop populates the timestamp

Appliances are transformation streams already, so it won't break any APIs to have the CountertopWorker insert a TimestampDecorator transformation stream that adds timestamps to payloads generated by appliances.

This feels right, but now begs the question: how does the origin time get defined?

slifty commented 3 years ago

Where does origin time get defined?

First let's remember the complexity here: a countertop might be set up to process multiple sources of video at once. In fact, some of those sources might be live video streams / URLs while others are a corpus of files.

Some use cases:

  1. Live TV: The origin time should be determined by the createdAt time + MPEGTS position of the first payload processed.

  2. Video File: This one is tricky because it's possible that the origin time is (A) not externally important (so we we could just use something like now() as the origin time), (B) embedded in the mpeg stream somehow (I forget if this is a thing or not). (C) known by the implementation / some ad hoc external spec (e.g. the implementer has decided to embed timestamp in the filename).


A Brief Interlude

Is a stream going to always have the same origin time, or can origin time change? For instance, might we have a video-folder-ingestor appliance that processes disparate videos as they arrive, with each video having completely different context and recorded at an arbitrary times?

Right now we have been working as though Payload positions are sequential -- with SEGMENT payloads de-marking breaks. This means payloads can be repeating but must be increasing (from a given appliance) -- so you wouldn't have an appliance emit a payload stream of positions: 1 3 7 100 52 because 52 is before 100).

We should codify that assumption (or codify the rejection of that assumption).


Really what it seems is that the ingestor appliance (which is in the position of knowing the temporal context / configuration for a given source) should be the place that determines the origin for a given stream.

Whether that ingestor calculates it from some ingestor-specific-logic or some ingestor-specific-configuration-parameter should be the decision of the appliance implementation.

Since ingestors are appliances, this is sure making it seem like appliances should populate timestamp...

slifty commented 3 years ago

OK so

image

Option 3: Both.

CountertopWorkers should construct a TimestampDecorator transform stream which decorates payloads generated by the appliance.

However, that transform stream would ONLY decorate timestamp if timestamp isn't already present. if the payload already has a timestamp (i.e. if the appliance populated a timestamp), that timestamp is not changed.

This means that appliances with enough context and need to calculate timestamp have the option to do so (and are of course responsible for any bugs they introduce if they do it wrong).

If there is no timestamp in a payload, the TimestampDecorator would apply a timestamp by appending the origin time for its stream.

Origin time for a stream would be determined as follows:

  1. It would default to now() upon worker creation.
  2. It would however, be updated as payloads are ingested. Fun fact: this override wouldn't happen for source appliances such as video-file-ingestor since they don't accept input payloads.

I don't think there is need for a universal appliance registration configuration option for origin time -- the appliance should be able to determine if a user should care about configuring that based on the context of the appliance -- but we'll see if that opinion changes as I implement this.

slifty commented 3 years ago

I keep feeling torn about option 1 vs option 3.

Appliances will be able to figure out timestamp if needed without being given the steam-context explicitly because that context is baked into the payloads they ingest already (if a payload they ingest has position + timestamp then it's already there)

Most appliances probably won't be generating new positions / new timestamps. For instance, the SRT generator will be using the position / timestamp of the first payload whose content is included in a given line.

On the other hand, there will be appliances that generate completely novel positions. For instance, an appliance that creates time aligned transcripts. Should those appliances be responsible for reverse-calculating the origin time using ingested payloads?

slifty commented 3 years ago

What I'm going to do for starters is update AbstractAppliance to calculate a streamOriginTimestamp based on the first payload (calculating the diff between timestamp and position).

This doesn't preclude the countertop from doing this some day as well.

slifty commented 3 years ago

Hey you know what might make even more sense.

Not calculating timestamp at all in order to generate a payload.

Specifically: modifying Payload so that instead of timestamp it has originTimestamp

This way the only time a calculation is made is when someone cares about timestamp, and they can easily do it by adding position to originTimestamp.

originTimestamp can be trivially copied from one payload to the next without the need for any math or date conversion.

This is going to be faster as well (since otherwise the "origin + position" math would have to be calculated for *every payload created in a topology).

slifty commented 3 years ago

Closing this because the concept of timestamp is gone now.