ruuda / hound

A wav encoding and decoding library in Rust
https://codeberg.org/ruuda/hound
Apache License 2.0
491 stars 65 forks source link

Bulk sample writing #84

Open kleinesfilmroellchen opened 1 month ago

kleinesfilmroellchen commented 1 month ago

In the old wav library, writing all of a file's samples in bulk was as easy as wav::write(header, i16_sample_buffer, write_impl). Even though this is not a flexible API (and hound even has the i16 specializations), in hound it's much more awkward:

let writer = WavWriter::new(&mut output_file, header)?;
let mut writer = writer.get_i16_writer(samples.len() as u32);
for s in samples {
    writer.write_sample(s);
}
writer.flush()?;

This also feels very inefficient to me. It's not easy for the compiler to see that on little-endian systems, the data can just be memcpy'd into WavWriter's buffer, or even directly into the output stream (i.e. this entire for loop should literally just be one system call on Linux).

I'd therefore like there to be a write_samples API for the WavWriter (and the 16-bit specialization) that takes &[Sample]. Straightforward enough, I hope.

A streaming design is less straightforward for many libraries, and they will always benefit from a bulk writing function since they already have all of the sample data available.

ruuda commented 1 month ago

WavWriter::write_sample is easy to use but inefficient. The SampleWriter16 was actually born as the result of careful profiling, when sample writing showed up as a bottleneck. When I wrote it initially, Rustc/LLVM was able to optimize it away completely, though I haven’t confirmed on recent Rustc versions.

i.e. this entire for loop should literally just be one system call on Linux

It is, it happens in writer.flush() in the final line.

kleinesfilmroellchen commented 1 month ago

Thanks for the swift response. I should clarify that my main concern is not performance, but usability. hound should be able to support both streaming and non-streaming users, and I don't think it would be too troublesome.

Besides, both f32 and i8 also can directly perform sample memcpy's, so why not generalize the optimization.

It is, it happens in writer.flush() in the final line.

Had already forgotten about this when I wrote my reply, even though I was thinking about it while writing the code. I bring up another problem, however: with this method, the code first needs to perform a copy from the user buffer into the WavWriter buffer, and then a second IO-related copy (performed by the kernel during the system call). Using a bulk write method, one can skip the intermediate buffer, which results in the minimal amount of copies achievable on normal operating systems.