Interpolation for data synchronised with video

Background

Sensor data are sampled at discrete times on a continuous timeline so interpolation is required to determine values at intermediate times for synchronisation with video.

Interpolation schemes should be simple, succinct and flexible enough to include most common use cases. Source data should be retained to enable correct postprocessing without introducing interpolation artefacts, unnecessary complexity should be avoided to reduce bloat and processing overheads minimised for resource-limited devices.

Supported interpolation schemes should include:

None: values are only valid instanteously at the sample time, e.g. headcount in a video frame;
Step: values remain constant until the next sample time, e.g. vehicle gear selection;
Linear: values are linearly interpolated to the next sample time, e.g. temperature.

The animate subcommand was originally conceived to support WebVMT animations, which has since been overtaken by a wider requirement for interpolated values and identified design improvements.

Proposal

The following design is proposed.

A WebVMT interpolation changes an object attribute from a start value to an end value during a given interval.

A WebVMT interpolation consists of:

An interpolation object The parent object of the attribute.
An interpolation attribute The attribute of the object to change.
An interpolation start value The initial value of the attribute.
An interpolation end value The final value of the attribute.
An interpolation start time The time at which to begin changing the attribute. By default, the interpolation start time is set to the cue start time. The interpolation start time may be defined as an absolute value, or calculated relative to the cue start time using a delay.
An interpolation end time The time at which to finish changing the attribute. By default, the interpolation end time is set to the cue end time. The interpolation end time may be defined as an absolute value, or calculated relative to the interpolation start time using a duration.

A WebVMT interpolation subcommand consists of one or more WebVMT interpolations with all interpolation objects set to the preceding WebVMT command.

Examples

1. No Interpolation

NOTE No interpolation
     headcount = 12 at 4 secs

00:00:04.000 --> 00:00:04.000
{"sync":
  {"type": "org.webvmt.example", "id": "sensor1", "data":
    {"headcount": "12"}
  }
}

NOTE No interpolation
     headcount = 34 at 6 secs

00:00:06.000 --> 00:00:06.000
{"sync":
  {"id": "sensor1", "data":
    {"headcount": "34"}
  }
}

2. Step Interpolation

NOTE Step interpolation
     gear = 4 after 2 secs until 6 secs

00:00:02.000 --> 00:00:06.000
{"sync":
  {"type": "org.webvmt.example", "id": "sensor2", "data":
    {"gear": "4"}
  }
}

NOTE Step interpolation
     gear = 5 after 6 secs until 9 secs

00:00:06.000 --> 00:00:09.000
{"sync": 
  {"id": "sensor2", "data":
    {"gear": "5"}
  }
}

3. Linear Interpolation

NOTE Linear interpolation
     temperature = 14 -> 16 after 4 secs until 6 secs

00:00:04.000 --> 00:00:06.000
{"sync":
  {"type": "org.webvmt.example", "id": "sensor3", "data":
    {"temperature": "14"}
  }
}
{"interp": {"to":
  {"data": {"temperature": "16"}}
}}

NOTE Linear interpolation
     temperature = 16 -> 19 after 6 secs until 9 secs

00:00:06.000 --> 00:00:09.000
{"sync":
  {"id": "sensor3"}
}
{"interp": {"to":
  {"data": {"temperature": "19"}}
}}

Live Streaming

Live streams can be recorded with interpolation using unbounded cues.

Note: Values may not be interpolated during capture as future data are unknown, e.g. for linear interpolation, but values will be correctly interpolated during subsequent playbacks.

Examples

1. No Interpolation (Live) - as above

2. Step Interpolation (Live)

NOTE Step interpolation - live
     gear = 4 after 2 secs until next update (6 secs)

00:00:02.000 -->
{"sync":
  {"type": "org.webvmt.example", "id": "live2", "data":
    {"gear": "4"}
  }
}

NOTE Step interpolation - live
     gear = 5 after 6 secs until next update (9 secs)

00:00:06.000 -->
{"sync": 
  {"id": "live2", "data":
    {"gear": "5"}
  }
}

NOTE No (step) interpolation - live
     gear = 5 at 9 secs

00:00:09.000 --> 00:00:09.000
{"sync": 
  {"id": "live2", "data":
    {"gear": "5"}
  }
}

3. Linear Interpolation (Live)

NOTE Linear interpolation - live
     temperature = 14 after 4 secs until next update (-> 16 at 6 secs)

00:00:04.000 -->
{"sync":
  {"type": "org.webvmt.example", "id": "live3", "data":
    {"temperature": "14"}
  }
}
{"interp": {"end": "00:00:06.000", "to":
  {"data": {"temperature": "16"}}
}}

NOTE Linear interpolation - live
     temperature = 16 after 6 secs until next update (-> 19 at 9 secs)

00:00:06.000 -->
{"sync":
  {"id": "live3"}
}
{"interp": {"end": "00:00:09.000", "to":
  {"data": {"temperature": "19"}}
}}

NOTE No (linear) interpolation - live
     temperature = 19 at 9 secs

00:00:09.000 --> 00:00:09.000
{"sync": 
  {"id": "live3", "data":
    {"temperature": "19"}
  }
}

Hi, For representing interpolation, there are generally two approaches: sliced representation and sequence representation. The one mentioned here is aligned with the sliced representation, that is, every entry represents the interpolation over a time interval. An interval would start from the ending of the previous one. The sequence representation, on the other hand, would mark a block of entries as a single sequence, with a single interpolation option (none, step, or linear) that would apply to all entries. Being inside a sequence block implies that every entry starts from the end time of the previous entry, without a need to repeat it. Using sequence representation has many advantages over sliced representation (see the examples below):

the representation is more readable, and follows the same structure, no matter the interpolation
it uses less characters, since the repetition of timestamps is avoided.
Within a sequence, we know that all intervals are contiguous. There is no need to check for this in the runtime. This comparison is elaborated in our paper: https://docs.mobilitydb.com/pub/TODS.pdf

The examples (non-live) would be expressed more or less as follows:

{"interpolation": none NOTE No interpolation headcount = 12 at 4 secs

00:00:04.000 {"sync": {"type": "org.webvmt.example", "id": "sensor1", "data": {"headcount": "12"} } }

NOTE No interpolation headcount = 34 at 6 secs

00:00:06.000 {"sync": {"id": "sensor1", "data": {"headcount": "34"} } } }

=======================

{"interpolation"= "step" NOTE Step interpolation gear = 4 after 2 secs until 6 secs

00:00:02.000 {"sync": {"type": "org.webvmt.example", "id": "sensor2", "data": {"gear": "4"} } }

NOTE Step interpolation gear = 5 after 6 secs until 9 secs

00:00:06.000 {"sync": {"id": "sensor2", "data": {"gear": "5"} } }

NOTE For Step interpolation last two instants must have same values

00:00:09.000 {"sync": {"id": "sensor2", "data": {"gear": "5"} } } }

=======================

{"interpolation": "linear" NOTE Linear interpolation temperature = 14 -> 16 after 4 secs until 6 secs

00:00:04.000 {"sync": {"type": "org.webvmt.example", "id": "sensor3", "data": {"temperature": "14"} } }

NOTE Linear interpolation temperature = 16 -> 19 after 6 secs until 9 secs

00:00:06.000 {"sync": {"id": "sensor3" "data": {"temperature": "16"} } }

00:00:9.000 {"sync": {"id": "sensor3" "data": {"temperature": "19"} } } }

Thanks for your analysis and feedback, and for highlighting the value of the live streaming use cases.

the representation is more readable, and follows the same structure, no matter the interpolation

I agree that your suggestion is more human readable, though in this case the aim is to balance machine readability for the web with human readability to make metadata more accessible. However, your use case assumes that a single interpolation scheme applies to all data, which breaks down when multiple interpolation schemes are required concurrently, e.g. a dashcam which records speed (linear interpolation), gear (step interpolation) and vehicle count (no interpolation) - see Mixed Interpolation Scheme example below.

it uses less characters, since the repetition of timestamps is avoided.

I agree. WebVMT cues are based on Web Video Text Track (WebVTT) cues including the WebVTT cue timing definition. WebVMT has already extended this definition to unbounded cues where the end time is unspecified, e.g. 00:00:02.000 -->, which is a common use case. A similar extension could be proposed for instantaneous cues where the start and end times are identical if we can identify common use cases and a suggested syntax.

Within a sequence, we know that all intervals are contiguous. There is no need to check for this in the runtime.

This is covered by the live streaming examples - see above. Note that cue end time in the original examples is not redundant as it determines the interpolation end time, c.f. live streaming examples which represent identical data using unbounded cues with interp subcommands that include an end attribute.

Design Analysis

WebVMT is designed to record all data necessary for display (including interpolation) prior to the current playback time for efficiency reasons. Your suggestion requires the playhead to additionally look ahead (and possibly parse to the end of the file) to calculate current intermediate values by identifying future end values of ongoing interpolations which incurs an extra processing overhead on the web browser.

In summary:

Interpolation calculation is synchronised with the media playhead and requires no look ahead;
Mixed interpolation schemes are supported, e.g. a dashcam may record vehicle count (no interpolation), gear (step) and speed (linear) for a vehicle - see example below;
Multiple sensor sample rates are supported, e.g. geolocation sampling may be slow (1Hz) but sensor data may be fast (100Hz) - see GPMF examples;
Temporal gaps (missing data value) can be discriminated from static values (no change in value) - see OGC Testbed-16 FMV which aims to track moving objects with periodic data dropouts.

Mixed Interpolation Scheme Example

This example aggregates data similar to the three original examples into a single file.

NOTE No interpolation
     vehiclecount = 12 at 4 secs

00:00:04.000 --> 00:00:04.000
{"sync":
  {"type": "org.webvmt.example", "id": "dashcam1", "data":
    {"vehiclecount": "12"}
  }
}

NOTE Step interpolation
     gear = 4 after 4 secs until 6 secs
     Linear interpolation
     speed = 14 -> 16 after 4 secs until 6 secs

00:00:04.000 --> 00:00:06.000
{"sync":
  {"id": "dashcam1", "data":
    {"gear": "4", "speed": "14"}
  }
}
{"interp": {"to":
  {"data": {"speed": "16"}}
}}

NOTE No interpolation
     vehiclecount = 34 at 6 secs

00:00:06.000 --> 00:00:06.000
{"sync":
  {"id": "dashcam1", "data":
    {"vehiclecount": "34"}
  }
}

NOTE Step interpolation
     gear = 5 after 6 secs until 9 secs
     Linear interpolation
     speed = 16 -> 19 after 6 secs until 9 secs

00:00:06.000 --> 00:00:09.000
{"sync": 
  {"id": "dashcam1", "data":
    {"gear": "5"}
  }
}
{"interp": {"to":
  {"data": {"speed": "19"}}
}}

I think consistency is important, so the slice representation would be better expression that the snapshot representation. But I wonder how to map the values of several objects. Does each object have each element of the sensing values?

00:00:06.000 --> 00:00:09.000 {"sync": {"id": "dashcam1", "data": {"gear": "5"} } } {"interp": {"to": {"data": {"speed": "19"}} }}

00:00:06.000 --> 00:00:09.000 {"sync": {"id": "dashcam2", "data": {"gear": "6", "speed":"10"} } } {"interp": {"to": {"data": {"speed": "20"}} }}

I agree that consistency would be beneficial, and I'm open to suggested improvements.

Thanks for your question about mapping values to multiple objects. Yes, each object has each element for the sensor type and your file excerpt is correct.

Handling Multiple Sensors

Each sensor has a type, e.g. org.mydata.example, which defines its attributes in a similar way to an XML schema.

The first sync command for each identified sensor object should include:

type to define the attributes;
id to identify the sensor;
values of all attributes.

Subsequent sync commands should only include:

id to identify the sensor;
values of any attribute that has changed.

I presume that the type is defined prior to time 6 in your excerpt, as in the original examples. In addition, the two cues have identical start and end times so they could be combined:

NOTE dashcam1 - gear 5, speed 16 at 6 secs
                        speed ->19 at 9 secs
     dashcam2 - gear 6, speed 10 at 6 secs
                        speed ->20 at 9 secs

00:00:06.000 --> 00:00:09.000
{"sync": 
  {"id": "dashcam1", "data":
    {"gear": "5", "speed": "16"}
  }
}
{"interp": {"to":
  {"data": {"speed": "19"}}
}}
{"sync": 
  {"id": "dashcam2", "data":
    {"gear": "6", "speed": "10"}
  }
}
{"interp": {"to":
  {"data": {"speed": "20"}}
}}

WebVMT interpolation is designed to supersede WebVMT animation. This is illustrated by an updated version of the Safe Drone example from the Editor's Draft which shows how the interp subcommand can replace the animate subcommand with syntax that is more modular and less verbose.

Safe Drone Animation Example

NOTE Drone starts at (51.0130, -0.0015)

00:00:05.000 -->
{ "panto":
  { "lat": 51.0070, "lng": -0.0020, "end": "00:00:25.000" }
}
{ "moveto":
  { "lat": 51.0130, "lng": -0.0015, "path": "drone1" }
}
{ "lineto":
  { "lat": 51.0090, "lng": -0.0017, "path": "drone1",
    "end": "00:00:10.000" }
}

NOTE Safety zone

00:00:05.000 --> 00:00:10.000
{ "circle":
  { "lat": 51.0130, "lng": -0.0015, "rad": 10 }
}
{ "interp":
  { "to": {"lat": 51.0090, "lng": -0.0017 }}
}

NOTE Drone arrives at (51.0090, -0.0017)

00:00:10.000 -->
{ "lineto":
  { "lat": 51.0070, "lng": -0.0020, "path": "drone1", "end": "00:00:25.000" }
}
{ "circle":
  { "lat": 51.0090, "lng": -0.0017, "rad": 10 }
}
{ "interp":
  { "end": "00:00:25.000", "to": { "lat": 51.0070, "lng": -0.0020 }}
}

NOTE Drone ends at (51.0070, -0.0020)

This proposal was included in the W3C Group Note published on 19 September 2023.

webvmt / community-group

Interpolation for data synchronised with video #2