mickleness / pumpernickel

This Java project includes classes related to desktop applications, Swing, performance, image processing, data structures, and other misc topics.
https://mickleness.github.io/pumpernickel/
MIT License
66 stars 12 forks source link

QuickTime audio support #17

Open simonrob opened 6 years ago

simonrob commented 6 years ago

I've used your QuickTime writing code for many years - it has been incredibly useful.

Many years ago I adapted your MovWriter to use an EditAtom / elst (as you allude to in a comment), rather than padding with silence. This was a tradeoff of file size against compatibility, as audio at the end of a file has been a common requirement for me. It was a fairly straightforward addition, and has worked well for many years as the QuickTime Player has no problems interpreting these atoms.

Recently, however, I'm encountering more and more problems with audio playback using this method, as a wider range of players/services are needing to be used. For example VLC cannot understand the elst atom, and simply stops playing audio entirely. YouTube sometimes works, but this is not always the case. I'm reluctant to switch back to the silence padding method, as this has a large impact on file sizes, especially when not compressing the output.

Looking into this issue recently, I noticed a todo note in your code suggesting a better way using chunk lookup tables (stbl?). However, my understanding of the atom structure is relatively basic. I wondered if you'd ever gone back and tried this method? (Could you outline how you one might go about this?)

Thanks for a fantastically useful project!

mickleness commented 6 years ago

Bad news: I looked into that TODO note this weekend and it didn't work.

It took a few hours to brush up on the QT file format again, but there's lots documentation online. Theoretically I still think the idea makes sense, but in reality QuickTime couldn't play back the movie -- which makes it a nonstarter. Specifically what I wanted to do was look at the stco atom. For example, I produced a sample file whose stco atom resembled: ChunkOffsetAtom[ version=0, flags=0, sizeTable=[ 16, 863216, 1676697, 2106422, 3367722, 4503527, 4922740, 6137239, 7376241, 7849431, 9008412, 10258364, 10832858, 11844895, 13111338, 13799834, 14638705, 15941951, 16787919, 17476042, 18764404, 19710317, 20268634 ]]

Long story short: each value in the size table is a file pointer that points to where 1 second of audio starts in my file. (at 16 bytes, 863,216 bytes, etc.). The theory was: can I just make several chunks point to the same file pointer, such as: ChunkOffsetAtom[ version=0, flags=0, sizeTable=[ 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 20268634 ]]

If I made 1 second of silence start at p=16 bytes, that should just loop that silence several times in a row to achieve the desired result, right? Unfortunately this broke the video track. It actually appeared to work for the audio, but as soon as the first loop started the video track went haywire.

Re-reading the specification didn't really offer me any clever insights in how to adjust the start time (except for the edit list atom, which we've both found to be problematic). It's 100% possible I'm missing something, and a more senior QT developer could fix this in a heartbeat, but I don't see an easy path forward.

My gut reaction is: what we're really doing is designing a complex work-around for the basic problem of uncompressed audio. Do we have any decent options for embedding compressed audio in a mov file? It's been several years since I looked into this. Possible leads include:

  1. WAV files can contain compressed data. Last I read that was poorly supported in Java, but I wonder if that's been updated in the last 5 years?
  2. JLayer is awesome, but IIRC MP3 is plagued with legal ... murkiness. Same for ffmpeg.
  3. Which brings us to ogg vorbigs. Does QT ship with a standard ogg decoder yet? That might be well-compressed, 100% safe option.
  4. What about all those other obscure formats QT claims to support? Things like "ulaw"? For our primary use case: we aren't interested in amazing compression. Anything that can compress a run of consecutive zeroes will give us significant gains, so if there's a rusty old format from a decade or two ago that we can use, that might suffice.

What are your thoughts?

My original use case for this encoder never worried around file size. We wrote a huge file, and then we used a separate tool to re-save the file as another format (usually mp4).

mickleness commented 6 years ago

Last night I also tried using multiple sample description entries in the same track. (So if my WAV was recorded as 44.1 kHz, 2 channel, maybe I could write each second of silence leading up to my sounds as 8 kHz, 1 channel?) But still no dice.

QT seems to acknowledge some of my changes (for ex: I can make some chunks playback at a lower rate), but then chunks just overlap in simultaneous playback. I need the chunks to play consecutively for this to have the intended effect.

simonrob commented 6 years ago

Thanks for taking the time to look into this, and sorry to hear it isn't working as straightforwardly as hoped.

Mixing multiple sampling rates is one of the things that has often caused me issues in the past, even when the edit list approach was working: some players just aren't built to support this sort of thing.

Related to this, though, I wondered whether it would be possible to mix formats within the same audio track? (Perhaps what you mean in your option 4, above). So, for example, have raw audio where we actually want it, then compressed silence, using one of the supported formats (i.e., these). This wouldn't then need to be looped, as it could just be inserted wherever necessary.

Does this sound feasible?

simonrob commented 6 years ago

Just to update this, I recently spent some time looking into these problems in more depth myself, exploring a few different ways to solve the issues I've been seeing. Some potentially useful findings:

This second finding in particular seems to be a similar approach to what you were trying with audio. Do you have a branch with this experimental code in that I could take a look at?

mickleness commented 6 years ago

Sorry for the delay; I had to go consult my backups. You can see this hardwires in a delay of 15.5 seconds