samvera-labs / ramp

Interactive, IIIF powered audio/video media player React components library. Styleguidist Docs: https://samvera-labs.github.io/ramp/
https://ramp.avalonmediasystem.org/
30 stars 5 forks source link

Support .srt formats for transcripts #441

Closed elynema closed 7 months ago

elynema commented 8 months ago

Is your feature request related to a problem? Please describe. Currently, Ramp only supports .vtt, plain text, and .docx formats for transcripts. Many caption files that are produced as .srt could be appropriate for transcripts, as well.

Describe the solution you'd like Parse .srt format files provided to the transcript component and display them as interactive timed text.

Additional context One motivating factor for considering this is that the .vtt files produced by auto-generated captions in Youtube seem to be invalid, and so don't work for transcripts in Ramp. However, the .srt files seem to be valid and could be an alternate option. It's unclear whether this is just a Youtube issue, or whether we might run into other issues with .vtt files produced by external systems?

An alternative to this is manually fixing .vtt files downloaded from Youtube, which is easy to do, as it requires deleting several lines at the start of the file.

Note that if we end up treating captions as transcripts in Avalon for search/display purposes, then we'll need to reconcile the format we allow as captions and transcripts; this could be another reason to support .srt for transcripts.

youtube-webvtt-example.vtt.txt

youtube-srt-example.srt.txt

joncameron commented 7 months ago

Works great, example used for testing at https://avalon-dev.dlib.indiana.edu/media_objects/8s45q877v.