usnationalarchives / OPAProd

Tracking enhancements to OPAProd
1 stars 0 forks source link

Subtitling of Video and Audio Files (& Amara) #80

Open mereastew opened 9 years ago

mereastew commented 9 years ago

We need to explore public contributions of subtitles created for video and audio files - ie records that would contain timestamp information. Using the Amara platform, we have timestamped subtitles that we could try to import. This crowdsourcing is happening on the Amara platform and with our files on Youtube, but what is the intersection with the online catalog?

Questions:

Amara is a really good tool and well designed - I would hate for us to build something inferior in our catalog.

This issue is one that is worth exploring so we can expand our Section 508 compliancy within the catalog for audio and video records.

DominicBM commented 9 years ago

Can users transcribe embedded videos from within the catalog site? This would be ideal if it's even possible, but we'd have to look into how accounts are managed across the sites. We also have to ask if we want to turn off non-timed subtitling, or how we allow both kinds of transcription to co-exist.

I think the first step is just to enable the catalog to understand captioning data. We'd need the catalog to understand not just blobs of transcription text, but captions tied to times in the recordings, and this would also export differently.

WaxCylinderRevival commented 9 years ago

Are we exporting the SubRip text (.srt) files from Amara and YouTube and then importing into the NARA catalog or just text files (.txt)?

mereastew commented 9 years ago

@WaxCylinderRevival - We'd need to figure this out. We're just beginning this discussion now. Do you have any recommendations?

WaxCylinderRevival commented 9 years ago

I'd have to do more reading, but here are some initial comments:

I imported and exported .srt files to caption the Nixon Tapes YouTube offerings. Once created, they were easy to use and easy to edit within the YouTube platform (and Amara, I'm sure) and great to save as our work product/record copy. The resulting files seemed potentially easy to manipulate through regular expressions and the like.

At first glance, it looks like SRT converted to WebVTT could be a way to go for HTML5: https://developer.mozilla.org/en-US/Apps/Build/Audio_and_video_delivery/Adding_captions_and_subtitles_to_HTML5_video

Most of the plug-ins and commercial products support SRT.

Just to add to the discussion: caption vs. subtitles - http://screenfont.ca/learn/ I assume we'll have both, yes? Captions for as-heard transcription (geared towards accessibility) as well as subtitling for foreign language variants?