pdeljanov / Symphonia

Pure Rust multimedia format demuxing, tag reading, and audio decoding library
Mozilla Public License 2.0
2.37k stars 138 forks source link

FLAC and WAV files being decoded into AudioBufferRef::S32 regardless of the file's bit depth #95

Open tesselode opened 2 years ago

tesselode commented 2 years ago

I wrote a script that decodes the first packet of a variety of audio files and reports what kind of AudioBufferRef the decoder returns:

use std::{error::Error, fs::File, path::Path};

use symphonia::core::{audio::AudioBufferRef, io::MediaSourceStream};

fn load_audio_file(path: impl AsRef<Path>) -> Result<(), Box<dyn Error>> {
    let path = path.as_ref();
    println!();
    println!("{}", path.file_name().unwrap().to_str().unwrap());
    let codecs = symphonia::default::get_codecs();
    let probe = symphonia::default::get_probe();
    let mss = MediaSourceStream::new(Box::new(File::open(path)?), Default::default());
    let mut reader = probe
        .format(
            &Default::default(),
            mss,
            &Default::default(),
            &Default::default(),
        )?
        .format;
    let track = reader.default_track().unwrap();
    println!("Bit depth: {:?}", track.codec_params.bits_per_sample);
    let mut decoder = codecs.make(&track.codec_params, &Default::default())?;
    match decoder.decode(&reader.next_packet()?)? {
        AudioBufferRef::U8(_) => println!("Decoding as U8"),
        AudioBufferRef::U16(_) => println!("Decoding as U16"),
        AudioBufferRef::U24(_) => println!("Decoding as U24"),
        AudioBufferRef::U32(_) => println!("Decoding as U32"),
        AudioBufferRef::S8(_) => println!("Decoding as S8"),
        AudioBufferRef::S16(_) => println!("Decoding as S16"),
        AudioBufferRef::S24(_) => println!("Decoding as S24"),
        AudioBufferRef::S32(_) => println!("Decoding as S32"),
        AudioBufferRef::F32(_) => println!("Decoding as F32"),
        AudioBufferRef::F64(_) => println!("Decoding as F64"),
    }
    Ok(())
}

fn main() -> Result<(), Box<dyn Error>> {
    for entry in std::env::current_dir()?.read_dir()? {
        let path = entry?.path();
        if let Some(extension) = path.extension() {
            if extension == "wav" || extension == "flac" {
                load_audio_file(path)?;
            }
        }
    }
    Ok(())
}

Here is the output I get with test files of various formats:

16.flac
Bit depth: Some(16)
Decoding as S32    

24.flac
Bit depth: Some(24)
Decoding as S32    

f32.wav
Bit depth: None
Decoding as S32

i16.wav
Bit depth: Some(16)
Decoding as S32

i24.wav
Bit depth: Some(24)
Decoding as S32

i32.wav
Bit depth: Some(32)
Decoding as S32

u8.wav
Bit depth: Some(8)
Decoding as S32

From what I understand, these files should be decoded into a type corresponding to the audio file's format.

I haven't tested with ogg and mp3 because I'm not sure if those file formats even support multiple bit depths.

I've attached the script and the test files.

symphonia-test.zip

pdeljanov commented 2 years ago

Hi @tesselode,

This is expected and is also an item on my to-do list. All lossy decoders output F32 (no such thing as a bit depth), and all lossless decoders output S32.

In terms of audio quality, this should be fine since all the other sample formats, excluding F64, can be losslessly converted to and from S32. The symphonia::core::conv::From/IntoSample trait provides automatic conversions if dealing with individual samples, and the symphonia::core::audio::RawSampleBuffer/SampleBuffer helpers also do automatic sample format conversion.

Ideally, the ALAC, FLAC, and WAV decoders would output buffers of the exact sample bit-depth they were encoded with, but right now they just provide S32 for simplicity.

Is this causing a big problem for you?

tesselode commented 2 years ago

I'm making a game audio library, and I was hoping to save memory on loaded sounds. Games may have a number of sounds loaded at a time, so I figured it would be good to reduce the RAM usage when possible. That being said, if reducing the bit depth isn't feasible for lossy decoders, then it may not be worth implementing, since I expect games to mainly use lossy file formats.

pdeljanov commented 2 years ago

I'm making a game audio library, and I was hoping to save memory on loaded sounds. Games may have a number of sounds loaded at a time, so I figured it would be good to reduce the RAM usage when possible.

That's fair, and something that could be improved. It's not an ABI breaking change so I can likely release it as a point release down the line.

Although, Symphonia doesn't generally hold too many samples in a decoder's audio buffer at a time. For PCM (WAV) it's 1152 audio frames. Wouldn't you ideally convert immediately to the audio mixer's sample format to save CPU time during playback anyways?

That being said, if reducing the bit depth isn't feasible for lossy decoders, then it may not be worth implementing, since I expect games to mainly use lossy file formats.

Yeah, there's no getting around using 4 bytes/sample here. Even if the lossy decoders were implemented in fixed point, the minimum bit-depth required is generally 24-bits, which ends up being stored in a 32-bit integer anyways!

tesselode commented 2 years ago

Although, Symphonia doesn't generally hold too many samples in a decoder's audio buffer at a time. For PCM (WAV) it's 1152 audio frames. Wouldn't you ideally convert immediately to the audio mixer's sample format to save CPU time during playback anyways?

For sounds that are being decoded on the fly, yes. Game audio libraries typically also allow you to load the whole sound into memory at once and keep it there for low-latency playback. In that case, it may be worth it to save on memory usage at the cost of CPU time.

pdeljanov commented 1 year ago

I've addressed this for files decoded using the PCM decoder (generally WAV files), but haven't decided yet if I'll modify the FLAC and ALAC decoders.

For these decoders, we'll always need an i32 scratch buffer to do the decoding work (unless the entire decoder is duplicated for the different sample formats), so outputting to an audio buffer with a differing sample type means we'll need to have two buffers in memory to do the conversion as a final step.

Either way, I think I'll need to extend and modify the audio module API to better support decoders that need to handle multiple sample format audio buffers. So I think I'll defer the rest of this ticket to atleast v0.6 where I will be making other major API changes.

mikesbytes commented 10 months ago

This feature would be useful for me. I'm writing a music player application, and ideally I should be configuring my audio output to match the audio format. As it stands currently, I am using the bits_per_sample field of CodecParameters and switching my SampleBuffer format based on that result, but being able to avoid that extra step would provide a nice little QoL improvement.