scottschiller / SoundManager2

A JavaScript Sound API supporting MP3, MPEG4 and HTML5 audio + RTMP, providing reliable cross-browser/platform audio control in as little as 12 KB. BSD licensed.
http://www.schillmania.com/projects/soundmanager2/
Other
4.99k stars 768 forks source link

Support for .vtt subtitles/captions #94

Open exside opened 9 years ago

exside commented 9 years ago

Is there any way or plan to add support for captions/subtitles to SoundManager2? It would be nice for example for text transcripts of spoken audio and could be added to the

scottschiller commented 9 years ago

Howdy! Sorry for late follow-up...

Interesting to note the draft status of that spec, and that it's disabled by default in Gecko (Mozilla/Firefox.) Reading further, that's likely old information, as the feature is enabled for me at time of writing in Firefox Aurora (Nightly) 43. Looks like it's been enabled as of Firefox 31, released in July 2014. https://www.mozilla.org/en-US/mobile/31.0/releasenotes/

The assumption is that the primary use case is for <video> - subtitles, karaoke lyrics and so on. However, I could see it being useful for showing lyrics alongside <audio> as well (or spoken word, or name your use case here.)

If I can append <text> as a child node (or object) to <audio> or Audio(), great.

Sounds like some aspects of this do not apply to <audio> elements, however? Maybe I'm missing something, here.

In HTML, audio elements don't have a visual rendering area and therefore, this algorithm will abort for audio elements. When authors do create WebVTT captions or subtitles for audio resources, they need to publish them in a video element for rendering by the user agent. http://dev.w3.org/html5/webvtt/#webvtt-rules-for-extracting-the-chapter-title

SM2 would need to change in a few non-trivial ways, for this to work as I understand it.

The biggest change, I think, would be that SM2 would need to accept parameters with URL(s) for .vtt files and the like with createSound() - and, then it would need to append those elements as child nodes to an <audio> element, or an Audio() object (and document.createElement('track'), if that could work in this context.)

If an Audio() object cannot have a native DOM 'track' element / object applied, I would have to change SM2 to create an <audio> element - which while that seems trivial, is a big change for the library because the base object for all sounds has now changed. Would want to test that one a lot. ;)

Furthermore, I would need to have the API expose the raw Audio() or <audio> object so that users can get at the .vtt object(s) and read their data, respectively - or, whatever events that may fire on the native audio element/object with the relevant metadata.

From the W3 link above, it looks like the .vtt is parsed and a DOM tree is formed for access to the data.

Given that, a JS utility could be written to walk the DOM and parse out the time codes for relevant subtitles/captions, and then use SM2's whileplaying() or similar time-based functions such as onposition() to show captions at the appropriate timestamps.