liamappelbe / wav

Dart package for reading and writing wav files
Apache License 2.0
18 stars 5 forks source link

Raw file support #11

Closed jamescrosswell closed 11 months ago

jamescrosswell commented 1 year ago

Hey @liamappelbe ,

This isn't ready for merging... just wanted to bounce the high level approach off you before proceeding. Let me know what you think. If this works for you, I can add some tests and a sample project.

Cheers, James

liamappelbe commented 1 year ago

Yep, this approach looks good

jamescrosswell commented 1 year ago

Hi @liamappelbe , sorry I've been distracted by various other things the past few weeks so haven't had a chance to work on this.

I just got back to looking at it today and noticed something a bit odd, that you can maybe help explain.

I figured to run a basic test I might take one of the test files you have (e.g. 400Hz-8bit.wav), save that as a raw file using Audacity, read it back in using raw_file.dart and run through more or less the same expect clauses you did with the wav reader tests.

The very first two values your tests expect are Sin(0) == 0 and Cos(0) == 1:

    for (int i = 0; i < wav.channels[0].length; ++i) {
      final t = i * 2 * math.pi * 400 / 8000;
      expect(wav.channels[0][i], closeTo(math.sin(t), epsilon));
      expect(wav.channels[1][i], closeTo(math.cos(t), epsilon));
    }

When I open the file in Audacity that checks out: image

The first two bytes that Audacity is writing to the raw file (using Raw Signed 8 bit PCM format) are 0x00 and 0x7F. So 0 and 127 in decimal...

For an 8bit PCM file that the wav reader uses the readS8 method to read in the bytes... readS8 interprets a byte as an 8 bit unsigned int transforms via the u2f function:

  // [0, 2] => [0, 2 ^ bits - 1]
  static double wScale(int bits) => (1 << (bits - 1)) * 1.0;
  static double rScale(int bits) => wScale(bits) - 0.5;

  int readUint8() => read(1).getUint8(0);
  int readU8() => readUint8();
  double u2f(int x, int b) => (x / WavUtils.rScale(b)) - 1;
  double readS8() => u2f(readU8(), 8);

What the Wav library interprets those two numbers as then are u2f(0,8) == -1 and u2f(127,8) == -0.0039215686274509665

More generally, the purpose of the u2f method appears to be to convert an integer representation of amplitude (e.g. a number between -127 and 127) into a floating point representation of that amplitude (e.g. a number between -1.0 and 1.0). There are two things I'm not understanding the reasoning behind though... and that's the couple of magic numbers in the utility functions:

  1. Why subtract 0.5 in the rScale function?
  2. Why subtract 1 in the u2f function?
liamappelbe commented 1 year ago

So firstly I have to apologize for these functions being unreadable and poorly documented. They were internal utils optimized for brevity, and I wasn't expecting anyone to use them 😅. Now that you're making them more visible, I'm planning to clean them up and document them after your feature lands, before I publish.

I'll have to dig into the code and play around with things to refresh my memory properly, but at first glance I notice that you're using the u2f to read a signed int, which seems wrong. In these super short function names, U means unsigned, S means signed, and F means float. This is consistent with the outputs you're seeing: u2f(0,8) == -1 and u2f(127,8) ~= 0. That also answers "Why subtract 1 in the u2f function?". We divide the unsigned int x in [0, 255] by the scaling factor, to map it to [0, 2], then subtract 1 to get [-1, 1].

The answer to "Why subtract 0.5 in the rScale function?" is more subtle. The R in rScale means read, and the W in wScale means write. So the real question is, why are the read and write scales different? Well basically my goal was to make it so that you could read a wav file, then write the same file, and get exactly the same bytes in the file. Conversely, you should be able to write a wav file, then read it back, and get the same floating point samples (aside from quantization error). You'll notice that a bunch of the tests are testing exactly that. But doing this in practice is slightly tricky. The int range doesn't map super cleanly onto the float range. Uint8 is [0, 255], int8 is [-128, 127], and float is [-1, 1]. Note that the int8 range isn't symmetric, and in either case you have to divide by 255, not 256, to hit the full range. So what int value should map to 0? Also, to make best use of the full int range, each quantization bucket should be the same size, but if you naively map and quantize [-1, 1] to [0, 255], the 0 and 255 buckets will be half the size of the others. So there's a few fencepost problems to think about. If you keep all that in mind, and play around in a spreadsheet for a while, you'll probably come up with the same mapping functions I did.

jamescrosswell commented 1 year ago

OK thanks @liamappelbe - that starts to make a bit more sense.

at first glance I notice that you're using the u2f to read a signed int, which seems wrong

It looks like this is a bug that has existed in the code since before my changes in that case. You can see in the last commit before merging in my last changes that the read method mapped for pcm8bit is readS8. I did a bit of a hunt around and apparently, according to IBM's original spec, the 8bit format is always supposed to be unsigned (it's unique in that respect). Not that anyone is likely using 8 bit wavs anymore!

Do you want me to address in the PR I'm putting together or as a separate PR/Issue?

liamappelbe commented 1 year ago

Do you want me to address in the PR I'm putting together or as a separate PR/Issue?

I'll investigate it, probably tomorrow. I thought I had 100% test coverage, so I'd be surprised if this bug went unnoticed. Maybe there's something else in the read utils that accounts for this. To progress with your PR, you can just ignore this for now.

jamescrosswell commented 1 year ago

Cool, will do.

I actually noticed something else really odd as well. Even when using the readUint8 and readUint16 functions, the audio data contains negative integers. This is peculiar since the underlying dart:typed_data.getUint8 and dart:typed_data.getUint16 functions return only positive integers.

I put together the following test, which fails with Expected: <65508> Actual: <-28>:

    test('store_uint16_in_int16list', () {
      int myInt = 65508;
      Int16List int16 = Int16List(1);
      int16[0] = myInt;
      expect(int16[0], 65508);
    });

So there seems to be an implicit/unintentional transformation taking place when the data is being read in... specifically, when it gets stored in the channel data.

I think I might wait until we've worked through the couple of issues above before progressing work on this, as the unit tests will need to change once the correct read function is being used and also once we've worked out what to do about the above (which appears to be a bug in dart:typed-data - Dart doesn't let you assign int to Int16 but it looks like when you use Int16List, it's possible to sneak one past the compiler).

The simplest fix would be to use List<int> for the channels instead of Int16List but that would obviously impact performance. There may be a more elegant solution.

liamappelbe commented 1 year ago

the 8bit format is always supposed to be unsigned

Oh right, I remember now. Yeah, for all the other formats it uses the _fold function to wrap the unsigned value to a signed value. Really the readS8 function is just misnamed, but there's already a readU8 function. I'll have to rename all this stuff when I clean up the API before publishing.

Returning to your original question, I guess the answer is that Audacity's "Raw Signed 8 bit PCM format" is just not supported at the moment. If you want to add support for it, then you'll need to add a new enum value to WavFormat.

The simplest fix would be to use List<int> for the channels instead of Int16List but that would obviously impact performance. There may be a more elegant solution.

The channels are stored as Float64Lists. Where are you running into this Int16 issue in package:wav? I don't think I'm using Int16List anywhere.

jamescrosswell commented 1 year ago

The channels are stored as Float64Lists. Where are you running into this Int16 issue in package:wav? I don't think I'm using Int16List anywhere.

I need a face palm emoji. I'm running into that in a fork/copy of the wav code. Ultimately I'm trying to hook all of this up to a pitch detection algorithm, which is expecting integer values. So I was using a variant of your wav code that didn't use the u2f functions and stored the data in an Int16List instead of a Float64List . When I started building all of this, that seemed to make sense. In hindsight, maybe ByteData would be sufficient for that problem.

jamescrosswell commented 1 year ago

Oh right, I remember now. Yeah, for all the other formats it uses the _fold function to wrap the unsigned value to a signed value.

Aha... that explains what _fold is for as well. Finally, all of the puzzle pieces make sense 😄. That probably would have been obvious to someone who's worked with byte level data a bit (which I obviously haven't). I have vague memories of stuff they taught us at university, but that was a while ago now.

liamappelbe commented 1 year ago

I think Audacity has an unsigned 8-bit PCM raw format you can use to create your test. Is that all the issues resolved then? Did you have any other questions?

jamescrosswell commented 1 year ago

OK, so @liamappelbe I finally carved out some time to advance this a bit.

I started with the read tests and I created the *.raw files by taking the equivalent *.wav file that you already had for the wav reader tests and exporting that as a raw file using Audacity (this time being careful o select Unsigned 8 bit when exporting the pcm8bit file 😜).

Interestingly, when doing that, to get the tests to pass I had to reduce the bitprecision used to calculate the epsilon to 26 bits for both the pcm32bit and float64 formats. Possibly that's due to differences between how Audactiy and Wav map from floating point amplitudes to the various target number spaces.

In any event, once I'd created the writer tests, I decided to overwrite the original raw files that I was using for the read tests... I could then bump the precision back up to match what we have in the wav reader tests and that also ensures we have symmetry between the reader and writer functions... which seems more important than trying to create something that matches what Audacity is doing.

Looking at the other tests you have for the wav reader/writer, I'm not sure it makes sense to replicate these for the Raw format.

liamappelbe commented 1 year ago

Ok, so is it ready to review? One of the tests is failing, but it's probably the inverse of the issue you were having running the tests locally: your machine and mine/github slightly disagree about how one of those values should be rounded. Maybe I can try regenerating the test data on my machine.

jamescrosswell commented 1 year ago

Ok, so is it ready to review? One of the tests is failing, but it's probably the inverse of the issue you were having running the tests locally: your machine and mine/github slightly disagree about how one of those values should be rounded. Maybe I can try regenerating the test data on my machine.

Yes, ready to review.

Ideally the tests would pass anywhere... the error it's giving is at location [200] is <169> instead of <168>. At location 200 I think i == 100 and it'd be the cosine value (from channel 2)... which should just be 1.0 and that in turn should be the same as the value at location [1] (which the test doesn't seem to have any problem with) 😕 . I can't reproduce this on my machine - the tests all pass here. Maybe you can reproduce on yours? It would save fluffing around with logging to see what's happening on the CI server.

liamappelbe commented 1 year ago

Ideally the tests would pass anywhere... the error it's giving is at location [200] is <169> instead of <168>. At location 200 I think i == 100 and it'd be the cosine value (from channel 2)... which should just be 1.0 and that in turn should be the same as the value at location [1] (which the test doesn't seem to have any problem with) 😕 . I can't reproduce this on my machine - the tests all pass here. Maybe you can reproduce on yours? It would save fluffing around with logging to see what's happening on the CI server.

I dug into this a bit. The failing test is the float64 test, so it's a difference in some byte of the encoded double. When I decode the bytes, they're super close. There were only 3 samples that differed:

From file, encoded by your mac Value according to my machine, before encoding After encoding/decoding
-0.8090169943749475 -0.8090169943749476 -0.8090169943749476
0.5877852522924722 0.5877852522924721 0.5877852522924721
0.8090169943749489 0.809016994374949 0.809016994374949

The disagreement between our machines is tiny. Also, the values generated by my machine don't change when they're encoded and decoded (and the same goes for your machine, since the test was passing for you), which means this is not a bug in your raw_file.dart implementation. It's just a tiny disagreement in floating point math between our intel and arm CPUs, or a difference in the implementation of sin/cos in our core libraries.

I think the fix is just to not use sin/cos. I'll file a bug and come up with something better later (maybe just a noise generator). For now you can just comment out writeTest('float64-stereo', WavFormat.float64);.

Can you add a raw version of bijective_test.dart though? That way the reading/writing of raw float64s will at least be covered by that test.

I'll review your code more tomorrow.

jamescrosswell commented 12 months ago

I addressed the review comments, added bijective tests and added a couple of other tests for code coverage.

liamappelbe commented 11 months ago

Whoops. I didn't get a notification when you marked this as ready. Sorry. 😅

liamappelbe commented 11 months ago

@jamescrosswell are there any other PRs you want to add before I publish this?

jamescrosswell commented 11 months ago

@jamescrosswell are there any other PRs you want to add before I publish this?

No all good, thanks for your help with this @liamappelbe (and for your patience answering my questions).

ty1135 commented 9 months ago

Stream support for dealing with large file

liamappelbe commented 9 months ago

@ty1135 is that a feature request? File a bug and explain your use case.