ruuda / hound

A wav encoding and decoding library in Rust
https://codeberg.org/ruuda/hound
Apache License 2.0
486 stars 65 forks source link

Possible issue when writing a mono float32 WAV file #54

Open eyeplum opened 2 years ago

eyeplum commented 2 years ago

Hi there,

I noticed an issue when I tried to write a .wav file with a single channel and float32 as the sample format.

The issue is when the file is played back by certain applications on macOS, instead of producing identical sound on both output channels, only the left channel is audible (the right channel is silent).

Applications where this issue can be observed:

All other applications I tried seems OK (e.g. macOS Music.app, Audacity, Ableton Live), which makes me think maybe it's potentially an issue with the said macOS built-in applications instead of hound.


However, to make things a bit more complicated, if I import the .wav file into Audacity and then immediately export it as a mono float32 .wav file (i.e. run it through Audacity's encoder), the issue seems to go away.

Here is a comparison of the file after (left) and before (right) Audacity's re-export:

Screen Shot 2022-03-18 at 2 50 01 PM

It seems Audacity uses 0x0003 (WAVE_FORMAT_IEEE_FLOAT) as the format code while hound uses 0xFFFE (WAVE_FORMAT_EXTENSIBLE).

Reading a bit more about the format codes (http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html), I noticed:

The WAVE_FORMAT_EXTENSIBLE format should be used whenever:
    PCM data has more than 16 bits/sample.
    The number of channels is more than 2.
    The actual number of bits/sample is not equal to the container size.
    The mapping from channels to speakers needs to be specified.

Which I suspect is the reason hound used WAVE_FORMAT_EXTENSIBLE in this case (since float32 means 32 bits/sample). However, looking around the page, I suspect it's possible that the word PCM here means integer samples only, speculating from the format codes WAVE_FORMAT_PCM and WAVE_FORMAT_IEEE_FLOAT. For example, maybe WAVE_FORMAT_EXTENSIBLE should only be used when 24-bit integer samples are used? This is purely my speculation at this point though.


I'm able to produce an example file with this issue by modifying the append.rs example as well:

use std::f32::consts::PI;
use std::i16;
use std::path::Path;

extern crate hound;

fn main() {
    let spec = hound::WavSpec {
        channels: 1,
        sample_rate: 44100,
        bits_per_sample: 32,
        sample_format: hound::SampleFormat::Float,
    };

    let path: &Path = "sine.wav".as_ref();

    let mut writer = match path.is_file() {
        true => hound::WavWriter::append(path).unwrap(),
        false => hound::WavWriter::create(path, spec).unwrap(),
    };

    // We should not append blindly, we should make sure that the existing file
    // has the right spec, because that is what we assume when writing.
    assert_eq!(spec, writer.spec());

    println!(
        "Old duration is {} seconds.",
        writer.duration() / spec.sample_rate
    );

    for t in (0..44100).map(|x| x as f32 / 44100.0) {
        let sample = (t * 440.0 * 2.0 * PI).sin();
        let amplitude = i16::MAX as f32;
        writer.write_sample(sample).unwrap();
    }

    println!(
        "New duration is {} seconds.",
        writer.duration() / spec.sample_rate
    );

    writer.finalize().unwrap();
}
ruuda commented 2 years ago

Thank you for taking the time to open such an extensive report.

It seems Audacity uses 0x0003 (WAVE_FORMAT_IEEE_FLOAT) as the format code while hound uses 0xFFFE (WAVE_FORMAT_EXTENSIBLE).

I think that Audacity by default tries to write the oldest format that still supports all the required parameters, to maximize compatibility. Hound also does this to some extent, but it implements fewer formats, so it goes full on WAVE_FORMAT_EXTENSIBLE here. That format is not limited to integer PCM, there is a SubFormat GUID and Hound sets it to KSDATAFORMAT_SUBTYPE_IEEE_FLOAT. I suspect that rather than this, the issue is the other property of WAVE_FORMAT_EXTENSIBLE that you pointed out:

The issue is when the file is played back by certain applications on macOS, instead of producing identical sound on both output channels, only the left channel is audible (the right channel is silent)

It is ambiguous what it means to play back a file with a single channel on an output device with multiple channels. For this, the wav format contains a dwChannelMask field that describes how channels map to speakers. Hound does not support customizing it at the moment, it will enable as many speakers as there are channels, starting with front left, then front right, front center, etc.

According to this MSDN page, the number of bits set must match the number of channels:

The channels specified in dwChannelMask must be present in the prescribed order (from least significant bit up). For example, if only SPEAKER_FRONT_LEFT and SPEAKER_FRONT_RIGHT are specified, then the samples for the front left speaker must come first in the interleaved stream. The number of bits set in dwChannelMask should be the same as the number of channels specified in WAVEFORMATEX.nChannels.

So it looks like QuickTime respects the channel mask — there is just no way for a WAVE_FORMAT_EXTENSIBLE file to express “play back this single channel on all speakers”.

I’m not sure what the best way to fix this is ... you could try toggling more or less bits in dwChannelMask and see how QuickTime interprets that, but according to MSDN that would be invalid to do. Maybe the only way forward is to add support for WAVE_FORMAT_IEEE_FLOAT, because that one leaves the channel to speaker mapping undefined, so the player is free to play back the single channel on multiple speakers. Or perhaps this is something you can configure in QuickTime, similar to how MPV offers --audio-channels to customize the channel to speaker mapping.

eyeplum commented 2 years ago

Thanks for the reply.

Maybe the only way forward is to add support for WAVE_FORMAT_IEEE_FLOAT, because that one leaves the channel to speaker mapping undefined, so the player is free to play back the single channel on multiple speakers.

Yeah, I'm leaning towards this option as well. I can try to find some time to give it a go. Would it be a good approach to start from https://github.com/ruuda/hound/blob/master/src/write.rs#L137 and gradually work out the rest of the format writing flow?

ruuda commented 2 years ago

Would it be a good approach to start from https://github.com/ruuda/hound/blob/master/src/write.rs#L137 and gradually work out the rest of the format writing flow?

There are multiple open issues related to dealing with some very specific use cases, and I’ve come to the conclusion that the best way forward would be to to offer two APIs: one low-level “build/read your own wav file toolkit” with functions to write and parse the headers and the various chunks, but where you can possibly create an invalid file if you combine the functions incorrectly. And a higher-level API compatible with the current one, that is safe to use, and that fits 95% of the use cases where you just want to read/write the samples without too much hassle.

To be very honest, this is something I have wanted to do for a long time, but I never make the time to sit down and go and do it. I appreciate that you want to give it a go, but there is already some unreleased stuff on master that I don’t want to release in this state, but I also never find the time to properly prepare a release. If you make a pull request, I don’t think I could review or release it soon, I’m sorry about that.

ArtemGr commented 5 months ago

Might be related...

Porting from https://crates.io/crates/wav,

let mut wavᶠ = fs::File::create (wavᵖ)?;
let wavʰ = wav::Header::new (3, 1, 44100, 32);
wav::write (wavʰ, &wav::BitDepth::ThirtyTwoFloat (wavᵃ), &mut wavᶠ)?;

plays in Total Commander default viewer (F3), whereas

let wavˢ = hound::WavSpec {
  channels: 1,
  sample_rate: 44100,
  bits_per_sample: 32,
  sample_format: hound::SampleFormat::Float};  // WAVE_FORMAT_IEEE_FLOAT 0x0003
let mut wavʷ = hound::WavWriter::create (wavᵖ, wavˢ)?;
for &sample in &wavᵃ {wavʷ.write_sample (sample)?}

does not (UI reports playing the file, but there is no sound).

$ file 'the-wav-crate.wav'
the-wav-crate.wav: RIFF (little-endian) data, WAVE audio, IEEE Float, mono 44100 Hz
$ file 'the-hound-crate.wav'
the-hound-crate.wav: RIFF (little-endian) data, WAVE audio, mono 44100 Hz

the-wav-crate.wav

00000000: 5249 4646 68ce 5600 5741 5645 666d 7420  RIFFh.V.WAVEfmt
00000010: 1000 0000 0300 0100 44ac 0000 10b1 0200  ........D.......
00000020: 0400 2000 6461 7461 44ce 5600 ecef 4a34  .. .dataD.V...J4

the-hound-crate.wav

00000000: 5249 4646 80ce 5600 5741 5645 666d 7420  RIFF..V.WAVEfmt
00000010: 2800 0000 feff 0100 44ac 0000 10b1 0200  (.......D.......
00000020: 0400 2000 1600 2000 0100 0000 0300 0000  .. ... .........
00000030: 0000 1000 8000 00aa 0038 9b71 6461 7461  .........8.qdata
djmaze commented 2 months ago

I stumbled upon a problem when building my (wasm-based) audio editor, and it turns out to be for the same reason.

I am using hound to load and save wav files in the editor, in float32 format. In Chrome, everything works fine. In Firefox, the sound is completely distorted. (For testing, you can also just try to play a file generated by the above append.rs example directly in Firefox. But turn your speakers down first..)

I just made the following change to hound locally in order to prevent the WAVE_FORMAT_EXTENSIBLE from being used for 32 bit wav files:

diff --git a/src/write.rs b/src/write.rs
index db04287..3ac5e49 100644
--- a/src/write.rs
+++ b/src/write.rs
@@ -374,7 +374,7 @@ impl<W: io::Write + io::Seek> ChunksWriter<W> {
         // more widely supported. For more than two channels or more than 16
         // bits per sample, the newer WAVEFORMATEXTENSIBLE is required. See also
         // https://msdn.microsoft.com/en-us/library/ms713497.aspx.
-        let fmt_kind = if spec.channels > 2 || spec.bits_per_sample > 16 {
+        let fmt_kind = if spec.channels > 2 {
             FmtKind::WaveFormatExtensible
         } else {
             FmtKind::PcmWaveFormat

Now, in Firefox everything sounds (and looks) nice. Here is a picture of a file saved using the current hound version vs. my patched version below – as read and rendered in Firefox:

image

So apparently Firefox interprets the WaveFormatExtensible version as 16 bit audio, while the PcmWaveFormat version is correctly recognized as 32 bit float.

Maybe hound should at least allow choosing the FmtKind, if not even change the behaviour as shown above.