w3c / sdw

Repository for the Spatial Data on the Web Working Group
https://www.w3.org/2020/sdw/
148 stars 81 forks source link

Breakout discussion: Video search with location #1130

Closed rjksmith closed 4 years ago

rjksmith commented 5 years ago

Summary

Emerging markets in 'mobile video devices', e.g. drones, dashcams, body-worn video and smartphones, are creating a new online community that can share video with metadata content. Displaying metadata, e.g. geolocation, synchronised with video adds valuable context which can significantly improve the viewer's understanding of the associated images. The same information can also be used to create a metadata index of the footage that offers quick and easy access for a search engine user to find relevant content in a video archive, though online search engines do not currently provide this facility.

Would video metadata search be valuable to users and, if so, how best could this be integrated with current services?

Use Cases

  1. Accident Investigation/Motor Insurance Automatically identify vehicle collisions in dashcam footage to provide forensic evidence for a police investigation or motor insurance claim.

  2. Remote Maintenance Visually monitor inaccessible assets, e.g. wind turbines, using autonomous drones to create a historical video archive that enables remote expert diagnosis of operational issues.

  3. Flood Monitoring Aggregate video footage from disparate sources to create a historical video archive that allows water levels to be monitored at different locations over time to help predict flooding.

More details of these use cases are given in OGC Ideas issues #91 and #92.

Gap Analysis

The search process can be divided into two parts: acquisition and publication.

  1. Acquisition A web crawler searches files and updates a search engine database.

    1. Source file Metadata can be embedded within the media file (in band) or contained separately in a linked file (out of band). For example, MISB is an in band solution and WebVMT is out of band - both approaches have pros and cons which should be evaluated.
    2. Web crawler The web supports three video formats - MPEG, WebM and OGG - and any search solution should address this issue. The source file containing the metadata must be identified and parsed correctly so that relevant details can be added to the search database.
  2. Publication A search engine responds to a user request.

    1. User agent A user requests a search from a web page or using a search API, e.g. RESTful interface. The user agent should be able to present the video and metadata content from the search response through a common API in the HTML Document Object Model (DOM). Work is currently underway in the WICG DataCue activity to investigate and address identified video metadata issues affecting the user agent, including feedback from this discussion.
    2. Search engine Users should be able to filter content by location and other metadata constraints, and results should be returned with associated location information.

Goals

The aim of this discussion is to:

  1. Capture key requirements where user or market needs are not yet met;
  2. Analyse critical gaps that prevent or impair end-to-end metadata search functionality;
  3. Propose solutions with clear explanation;
  4. Agree future activities to investigate outstanding issues and develop working prototypes.

An agenda item has been scheduled to discuss this at the SDW IG meeting on 25th June 2019 in Leuven.

rjksmith commented 5 years ago

File formats have been compared in issue #1120, including a comparison of video metadata format features and a mapping between MISB tags and WebVMT attributes.

rjksmith commented 5 years ago

Many thanks to all those who participated in the breakout discussion at the Spatial Data on the Web meeting on 25th June 2019 in Leuven. I've now collated the feedback with the goals and minutes.

Conclusions

  1. Captured key missing requirements

    1. There is a requirement for balance between accuracy and bandwidth, e.g. per frame camera orientation information, in video metadata to avoid overcomplexity.
    2. Three-dimensional location is a requirement for the drone market in particular, as these cameras are airbourne.
    3. Other interpolation algorithms, e.g. cubic splines, may be required in addition to linear interpolation.
    4. Metadata should include dynamic camera attributes, e.g. orientation and zoom, especially for cameras that are not static relative to their mobile platform, e.g. drone cameras can often tilt and pan, and for Augmented Reality (AR) applications.
  2. Analysed critical gaps in search

    1. A distributed search use case should be considered, where there is no central server and processing is distributed among several peers with no single control point.
    2. An interval model may be suitable for displaying a (text) subtitle with video, but may not be well-suited to representing location at a particular instant.
  3. Proposed solutions

    1. Accuracy versus bandwidth balance is addressed by WebVMT's keyframe and interpolation approach, e.g. to track moving objects with WebVMT paths. Data can be recorded at arbitrary intervals and interpolated to produce interim results with the desired level of accuracy, without imposing a high bandwidth overhead. Consideration should be given to the range of interpolation algorithms available.
    2. Adding an (optional) altitude attribute to WebVMT locations would address the identified 3D location requirement. Omitting altitude would imply a ground-level location.
    3. An interval model for video subtitles, e.g. WebVTT, displays a text phrase for a duration, but also includes the concept of instantaneous position within that interval. Start and end times of a text cue correspond to the start and end of the associated audio content, and the cue advances as individual words are spoken, so instantaneous representation is implicit in the design. The WebVMT path concept takes the same approach, so an object moves from the start to the end location during the cue interval, and its instantaneous location can be calculated in the interim by interpolation.
  4. Agreed future activities

    1. Investigate how video metadata search can assist the OGC Disasters Pilot.
    2. Investigate how video metadata search can assist the OGC Smart City Interoperability Reference Architecture.
    3. Identify use cases and requirements for non-linear interpolation of video metadata in a separate github issue.
    4. Identify suitable camera attributes and an AR use case in a separate github issue.
rjksmith commented 5 years ago

Separate issues raised to discuss:

Comments and feedback welcome.

chris-little commented 5 years ago

@rjksmith An imortant aspect of interpolation is whether a scheme is reversible. After many years/decades, researchers look at not very nice looking maps of weather data, because they have adopted linear interoploation, which allows you to recalculate the original values from the interpolated values. Other schemes, such as fitting various splines, look prettier, but 'lose' the original data/positions. The other extreme end of the interpolation algorithm spectrum is using the full dynamical equations to constrain the possible values. E.g. an interpolated position may not be physically possible because the drone, or wind, could not move or turn fast enough. Perhaps you need an attribute to state whether the interpolation scheme is satisfying local, or global, or both, constraint types. HTH, Chris

rjksmith commented 5 years ago

@chris-little Thanks. Response posted to the interpolation discussion.

rjksmith commented 5 years ago

Added search use cases to WebVMT Editor's Draft:

rjksmith commented 4 years ago

Added AR use case to WebVMT Editor's Draft:

rjksmith commented 4 years ago

Thanks to all who contributed to this discussion