Open pthopesch opened 5 years ago
@skynavga You put a ttml3 label on this issue. Do you see ttml3 as a document that will still be published in 2019?
I'd like to provide some further details on this requirement.
The use cases described here are still valid: https://github.com/immersive-web/proposals/issues/39
The major aspect of this requirement is to standardize the way how to refer TTML subtitles (2D coordinate system) to a 3D environment. Spatial information could be described with two angles (azimuth, elevation) and a depth information or as a 3D vector (x, y, z).
Possible solutions: The current implementation in the ImAc project (where use cases where investigated) works in the way that a 2D plane is put into a 3D scene where the 2D plane is used as root container region for IMSC subtitles.
<tt:p xml:id="p11" region="R1" style="S2" begin="00:01:50.080" end="00:01:51.160"
imac:equirectangularLong="20">
<tt:span style="S3">(David) Can you hear us?</tt:span>
</tt:p>
The value of "imac:equirectangularLong" describes the longitute angle of a speaker, where "0" refers to the center of a corresponding equirectangular video. Value range [-180; 180], positive values: left of the center.
<tt:p xml:id="p11" region="R1" style="S2" begin="00:01:50.080" end="00:01:51.160"
imac:equirectangularLong="20" imac:equirectangularLat="5">
<tt:span style="S3">(David) Can you hear us?</tt:span>
</tt:p>
In this case, the two values provided by "imac:equirectangularLong" and "imac:equirectangularLat" describe the latitude and longitude of the center of the 2D subtitle plane.
Note: An exact definition for such spatial values is needed. For instance they could refer to the speaker, to a position where the subtitle should be rendered (that would not be the spreakers face) or to a point (e.g. center) of a 2D rendering plane. In the current implementation we did, that was irrelevant.
The Timed Text Working Group just discussed Support 3D space (360°/VR/XR) as target presentation environment tt-reqs#8
, and agreed to the following:
SUMMARY: Group discussed this, generally supportive, contingent on effort being available to make it happen.
SUMMARY: Group discussed this, generally supportive, contingent on effort being available to make it happen.
Update: @tairt has volunteered to take the lead on this and offers editor capacity.
Recent discussions with stakeholders and expert have shown that requirements reflected in this issue are important and need to be addressed but also that scope and details need further discussion in a broader context. @TrevorFSmith has scheduled a call of the W3C Immersive Web Community Group on May 21st where requirements for subtitles in immersive environments will be discussed (see https://github.com/immersive-web/proposals/issues/40 for the discussion).
In the light of this development I am confident that some first specification text can be drafted still this year, but that it will be to early to have a final publication by the end of 2019.
Thank you for the update @tairt - very helpful.
See the discussion on the issue proposal of the WebXR community group (https://github.com/immersive-web/proposals/issues/40#issuecomment-496124420 downwards for the discussion after the remote meeting with the XR community group).
The latest research has shown that the most urgent requirement is the display of the subtitles that is always in the field of view of the user (see https://github.com/immersive-web/proposals/issues/40#issuecomment-511315747). This would require static rendering of subtitles on a 2D plane and is very much already supported by IMSC. What may be missing is to define this as a requirement for XR devices where the perspective of the content changes (through user movements) but the subtitles need to be "fixed to the screen".
Another requirement that is not met yet is the addition of metadata to locate the position of the audio source on the horizontal radius. This is needed to guide the user in which direction he needs to move to see the audio source/speaker of a displayed subtitles.
If there is an agreement on these requirements I think that they are now sufficiently scoped and the work on an explainer and the module can start.
@tairt have you considered using tta:pan
to specify "the position of the audio source on the horizontal radius"?
Thanks @skynavga for the pointer. When I understood tta:pan
correctly it is used to position audio from full left to full right pan.
The requirement we have is actually to have the geographical position of an object in a 360 video environment that relates to the subtitle. It could be expressed for example as longitude value (taking it over from the geographical coordinate system).
With this data a presentation processor could render some help icons (e.g. arrows) to point the user to the audio source of a subtitle when it is not present in the picuture. See below two examples from the IMAC project how this could be rendred:
tta:pan
uses a 2D stereoscopic pan function based on the Web Audio StereoPannerNode
interface; a more generalised PannerNode
interface with 3D positional coordinates is also available.
@nigelmegitt Do you think that this vocabulary would best fit the requirement? Keep in mind that the actual audio would be in a lot of cases not "spatial". The information will be needed to locate the part of the video image a subtitle relates to. This will be in most cases the placement of the speaker in the video picture. So in that sense the term "audio source" may be ambigious as it not really about the position of the audio but the graphical representation of an object that in the context of the "story" is assumed to be the originator of the audio.
@tairt I was including it for completeness, thinking that it may be something we want to add as an additional feature later. My mental model here is a two stage process:
In other words, once you have resolved 2. then using PannerNode
would be appropriate, possibly in combination with a GainNode
to simulate the effect of distance. I agree it is unlikely to be adequate by itself if you want to delegate the presentation to presentation time (which I think we must do).
Note also that there is a new W3C community group proposal for immersive captions (https://www.w3.org/community/groups/proposed/).
The Timed Text Working Group just discussed 360º Subtitles
, and agreed to the following:
SUMMARY: This issue on hold for the time being; pick it up again when there's a concrete proposal e.g. from a CG.
In the past years more and more applications show up that show media content in 3D space, like 360° videos (stereoscopic or not), VR experiences, etc.. Subtitles (if present) are mostly shown at the bottom center in the current field of view. Another option that is used sometimes is that they are burned into the video at three different positions (bottom, evenly spaced) such that one of the three subtitle is always visible to the viewer (at least partly).
I think that there is more to subtitle representation in 360°/VR/XR than that. We investigated subtitles for 360° in the imac project (imac-project.eu).
As for today, an established way of presenting subtitles in 360°/VR/XR doesn't exist. This requirement is still a very general one and needs further studies.
A standardized solution would be great. There is some activity already in MPEG (MPEG-I, OMAF: https://mpeg.chiariglione.org/standards/mpeg-i/omnidirectional-media-format). The topic was discussed during the last TPAC meeting and as a follow-up action I created two issue in the W3C XR Community Group: 1) Use case description for subtitles in 360° videos (https://github.com/immersive-web/proposals/issues/39 2) Overview of requirements https://github.com/immersive-web/proposals/issues/40