sciencehistory / chf-sufia

sufia-based hydra app
Other
9 stars 4 forks source link

Audio file support master ticket #1231

Closed jrochkind closed 5 years ago

jrochkind commented 5 years ago

We are going to add support for audio file ingest.

The current plan is for files to be ingested into Digital Collections app as either FLAC or MP3 (usually FLAC, MP3 for some legacy files with MP3 originals).

We plan to provide an HTML5 audio element to play files in browser. (At least for first implementation, simplest thing that might work).

Different browsers support different file formats for playing in browser. According to my research, WebM seems to be the currently preferred format supported by modern browsers. That plus MP3 seems to hit almost any browser still in use.

We also need to decide if we want to offer additional derivative types for download. Like offer someone a download option of an MP3 version even if the original was FLAC? If we're going to make and store an MP3 as a derivative for dl anyway, obviously might as well supply it as an alternate format to html5 audio tag.

Order of implementation

  1. Get it so you can add a FLAC or MP3 audio file to the Sufia app at all.

    • Right now trying to upload one will possibly result in errors, as ingest-related code that assumes everything is an image might try to treat the audio file as an image, and raise.
    • Make sure that doesn't happen. Start with no derivatives being created for audio files.
    • This step is done when you can complete the steps to ingest a file with an audio file, and then go look at that audio file in the front-end UI, all without getting any 500s. You can "download original" for audio file, and get the original. At this step, in the space for a thumbnail, it may be blank or a missing image placeholder, that's fine or expected.
  2. Add derivatives creation for audio files.

    • Start with WebM. Maybe MP3 too.
    • Not sure the best way to convert a FLAC or MP3 to these other elements. Perhaps ffmpeg command line? May need to make sure ansible is installing ffmpeg on deploy machines, if it's not already.
    • We need to consider appropriate bitrate and compression level for derivatives. Assuming human conversation rather than music? We want to minimize file size without noticeable drop in perceived quality. 64 bps may be enough. For MP3s, "V2" compression level is probably good, equivalent in WebM-ogg.
    • This step is done when the derivatives have been created and exist, they aren't yet showing up in any UI yet.

3.Now we need an actual UI that allows playability and downloadability

  1. We'll presumably have another round of front-end UX tweaks after we have a basic demo as per 3 above, premature to spec that now.

1-3 should be done as separate PR's. (or even more individual PR's than that is okay too).

New App

We need to do these things in parallel in New App. At least we need to do 1 and 2 above, we don't have a 3 in new app yet really. @eddierubeiz I'd like to propose trying out a workflow this time where you do 1 in old app, then do 1 in new app. Then 2 in old app, then 2 in new app. Instead of 'finishing' old app first then moving to new app. This will ensure that we don't have the option of neglecting new app, the feature is not done until it's in both. And should help minimize "context switching" cost.

sanfordd commented 5 years ago

Quick note, WebM is a container that supports at least 2 different audio encoding standards (Vorbus and Opus I think) and we'll want to decide which encoding standard to use.

jrochkind commented 5 years ago

Some Audio/visual oral history presentation examples/models, all using the "Oral History Metadata Synchronizer" (OHMS), that we are not currently using. (Thanks @yonyitz)

More OHMS examples