Closed mixxxbot closed 2 years ago
Commented by: daschuer Date: 2015-01-06T16:28:11Z
The range of libmad samples is -8.0 to +7.9999999962747097015380859375 So it is probably no bug?
Commented by: ywwg Date: 2015-01-06T16:36:47Z
the conversion function is broken: -32768 gets converted to -1.00003 instead of -1.0
Commented by: daschuer Date: 2015-01-06T16:45:42Z
No, it is properly clipped to 16 bit in SoundSourceMp3::madScale()
What a pity: a fix is In progress Bug #1156569
Commented by: rryan Date: 2015-01-06T17:04:04Z
Nice catch Owen!
Daniel, does fixing SampleUtil::convertS16ToFloat32 fix your clipping track as well?
Commented by: daschuer Date: 2015-01-06T17:15:03Z
Here is one:
void SampleUtil::convertS16ToFloat32(CSAMPLE* pDest, const SAMPLE* pSrc,
unsigned int iNumSamples) {
for (unsigned int i = 0; i < iNumSamples; ++i) {
pDest[i] = CSAMPLE(pSrc[i]) / CSAMPLE(SAMPLE_MAX);
}
}
0x8000 / 0x7FFF = 1.000030519
But I think we have an issue in madScale(), since it clips the value to unsymmetric positive and negative range. .. And I think we have the root issue that madScale() clips at all.
Conclusion, there are mp3s out there with samples > 1. So fixing this by adding a 1 in certain places will only hide the clipping from the user, which is not a nice idea.
@Uwe: In what condition is your floating point API fix? Can we be brave and commit it to 1.12.
.. or should we just postpone this bug to 2.1?
Commented by: ywwg Date: 2015-01-06T17:22:07Z
https://github.com/mixxxdj/mixxx/pull/453
In portaudio, pa_convert.c, they convert from short to float by dividing by 32768. They convert from float to short by multiplying by 32767.
Commented by: ywwg Date: 2015-01-06T17:22:46Z
(sorry daniel, I didn't see your note until I posted mine)
Commented by: daschuer Date: 2015-01-06T17:22:59Z
Daniel, does fixing SampleUtil::convertS16ToFloat32 fix your clipping track as well?
Yes, but this is only a band Aid. IMHO we should not fix it in convertS16ToFloat32, since a 0x8000 value is actually clipping, because the positive counterpart is missing.
We may fix the clamping in madScale() to +-0x7fff or apply Uwes patch.
Commented by: ywwg Date: 2015-01-06T17:30:10Z
Daniel I think the range really is -32768 to 32767, there is more db on the downswing than on the upswing. so there is an absolute value for which the downswing is not clipping, but the upswing is. Isn't that why the libmad range is -8.0 to +7.999? Or are you saying that unbalanced range is our code?
Commented by: ywwg Date: 2015-01-06T18:27:13Z
I would prefer not to apply Uwe's patch if we can solve this in a more focused way. I have done some reading on the mp3 spec and it's not totally clear if the range is asymmetric or not. Although the madlib code mentions an asymmetric range, the precision seems higher than short ints can support. So I'm starting to lean toward fixing the madscale function as you suggest.
RJ what do you think?
Commented by: rryan Date: 2015-01-06T18:33:26Z
MP3 and other formats whose internal representations aren't samples are probably more free to get around the issues of two's complement number representations but formats like WAV that are based on signed values internally have a discrete set of values they can represent and the SoundSource API as it stands is built around that with our signed 16-bit integer read methods.
@Daniel -- I don't think that it's hiding clipping. -32768 is a valid number in s16 PCM.
The conversion from S16 to [-1.0,1.0] float in Mixxx is a conversion of the representable numbers, not a signal conversion -- it should be agnostic of the underlying signal. It's the codec's job to produce a signed 16 number that represents the signal as best as the codec can. For WAV -- that's really easy since it matches the internal representation. For MP3 -- it's not a perfect match. That's why we'll get rid of this in a future version so that each codec can convert to float in the best way it can.
I'm also against Uwe's patch for 1.12.0 -- as much as I really really want it! It's too risky IMO.
If there is also an issue in madScale then let's fix that too. I think Owen's fix to convertS16ToFloat32 is warranted.
Commented by: rryan Date: 2015-01-06T18:41:23Z
If we were to do the representable number conversion "right" (so that -1 corresponds to the minimum representable value and 1 corresponds to the maximum representable value) we would divide negative numbers by SHRT_MIN and positive by SHRT_MAX but that will slow down the loop.
As Owen points out, PortAudio has similar assumptions built in around the range of signed 16-bit ints.
I assume this is a similar performance optimization to prevent clipping in either conversion without having to do comparisons in the loop. I'll write to portaudio-dev to ask Ross.
Commented by: rryan Date: 2015-01-06T19:10:19Z
Looking at how others manage converting libmad's fixed point numbers to s16 -- I think our madScale function is fine.
madlld: https://github.com/bbcrd/audiowaveform/blob/master/src/madlld-1.1p1/madlld.c#L165
madplay: https://github.com/Distrotech/madplay/blob/master/audio.c#L436
It seems our madScale method is right out of madplay: https://github.com/Distrotech/madplay/blob/master/audio.c#L213
Except our clipping is slightly different: https://github.com/Distrotech/madplay/blob/master/audio.c#L176
We don't do the "signal statistics" that madplay does (which conditionally applies the clipping?). Not sure I understand why that is -- -- perhaps there's a post-processing scaling based on the signal statistics?
Commented by: daschuer Date: 2015-01-06T19:36:25Z
IMHO Portaudio does it just right. For me -32768 is an invalid value, because the positive counterpart is just clipped. libmad uses Integer math and this would be just hard if we have a different scale on the positive and on the negative side of an integer only because of one bit. But since -32768 can actually happen, port audio does its job right wen using 32768 from 16 bit to float and 32767 to 16 bit. So no one can blame port audio for clipping.
Commented by: daschuer Date: 2015-01-06T20:24:39Z
In my test, the raw value range of libmad samples is: -321279529 .. 308258210 -1.1968595 .. 1,148351319
Which mean massive clipping
Commented by: daschuer Date: 2015-01-06T20:49:22Z
Testes a couple of tracks and now I am at -343734695 .. 351910901 -1.28 .. 1.31
Thins this bug is solved, we need a new one
Commented by: rryan Date: 2015-01-06T21:24:07Z
IMHO Portaudio does it just right. For me -32768 is an invalid value, because the positive counterpart is just clipped.
I don't think it's clipped, it's lower resolution because there are fewer possible positive discrete values.
Let's say we have a function generator that produces a 5V peak to peak sine wave.
Now capture a digital representation of this signal using an ADC (like [1]) that produces 2's complement 16-bit numbers (as a sound card / capture card would). 32767 represents 5V on the positive side and -32768 represents -5V on the negative side. 32768 is an invalid value since it would be above the reference voltage. 32767 represents the maximum positive voltage. No clipping has occurred.
The limitation of discretizing the signal into an even number of buckets requires you to choose between having no zero value (impractical for digital audio since you can't represent silence) or giving one side more resolution. It's the same problem with 7-bit MIDI knobs.
Looking at some DACs (e.g. [2]) -- they also speak two's complement. So no resolution issues arise if you feed the output of an ADC into a DAC since both have allocated one more discrete value of resolution to the negative side since they use two's complement.
With full knowledge that we are sitting on top of a skyscraper of abstractions between Mixxx and the hardware, if everyone passed through their signed 16-bit samples then it should "just work" and the asymmetric resolutions wouldn't hurt anyone. However, we use DSP algorithms to process the signal that don't account for the mismatch. This could lead to minor distortions -- which is why the right thing to do would be to adjust for the asymmetry when we convert to floats.
Since Mixxx does all of its processing using [-1, 1] floats, the value range [-1, 0) should capture the entire range of negative values linearly. Similarly the value range (0, 1] should capture the entire range of positive values linearly.
We are choosing to accept this inaccuracy as a performance optimization -- which seems to be a fine trade-off that others (e.g. PA) also do!
It's been a while since I used an ADC / DAC directly. Maybe they behave differently than I think?
[1] http://www.analog.com/static/imported-files/data_sheets/AD1877.pdf [2] http://www.analog.com/static/imported-files/data_sheets/AD1852.pdf
Commented by: rryan Date: 2015-01-06T21:30:32Z
Interesting, looks like PortAudio has discussed this at length and have plans to potentially change what they do: https://www.assembla.com/spaces/portaudio/tickets/100#/activity/ticket
Bjorn has written a couple of useful blog posts on the topic: http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html
Based on this post, Bjorn recommends scaling the negative side and positive side asymmetrically: http://music.columbia.edu/pipermail/portaudio/2009-December/009709.html and confirms that harmonic distortions occur with other methods.
Ross wrote a response: http://music.columbia.edu/pipermail/portaudio/2009-December/009715.html
Commented by: daschuer Date: 2015-01-06T22:56:05Z
I have never worked with 2's complement 16-bit numbers ADCs/DACs, but it sounds scary that they should have a different scale for negative and positive side. I am afraid this question is practical not important because it will fall below the non linearity issue at the signal borders and the offset error.
http://www.ti.com/lit/an/sbaa042/sbaa042.pdf Explains the issue explicit. see BTC — BINARY TWO’S COMPLEMENT A -5 V.. +5 V 4 bit coverter has digital value representation from +5 .. - 4,3
Everything else would make the date useless for doing math, without a rescaling.
I cannot find this explicit written in the Analog Devices data shed, so maybe the think about different.
Commented by: daschuer Date: 2015-01-06T23:01:17Z
Analog devices does the same like TI for this device: http://www.analog.com/static/imported-files/data_sheets/AD7776_7777_7778.pdf Se Figure 5. one value at the positve end is skipped.
Commented by: rryan Date: 2015-01-07T03:47:40Z
Lots of good discussion from 2004: http://music.columbia.edu/pipermail/portaudio/2004-April/003516.html
Nice find RE: those spec sheets. It looks like the world of DAC/ADCs is a mixed, often undocumented bag in this regard (lots of discussion in that thread about various spec sheets).
Commented by: uklotzde Date: 2015-01-07T15:19:47Z
See my comment about decoding of MP3: https://bugs.launchpad.net/mixxx/+bug/1408100/comments/16
The PR for the floating-point API is huge although the modifications that are needed to integrate it into the Mixxx code base are just a few. I did a rewrite of most SoundSources, because the code I found was rather ugly, inefficient, and error prone. It's still far from perfect, but I'm pretty confident now ;) I'm frequently rebasing the NewSoundSourceAPI branch on the current master branch, so feel free to test it.
I'm already using the new API for my private version on Fedora without any issues so far (m4a, mp3, flac), but the Windows and OS X ports are still untested.
Commented by: uklotzde Date: 2015-01-08T08:19:19Z
The existing 16-bit integer SoundSource API does not provide any headroom for the decoded output. See me comments here: https://bugs.launchpad.net/mixxx/+bug/1408100/comments/19
The new floating-point API will fix this.
Issue closed with status Fix Released.
Reported by: daschuer Date: 2015-01-06T16:12:00Z Status: Fix Released Importance: Medium Launchpad Issue: lp1408011 Tags: soundsource
Some of my mp3 tracks are clipping if Mixxx is at unity. No EQs, No effects, Gain and master gain at 1 and ReplayGain disabled Default boost = 0 dB.
I cannot reproduce the issue with a sine wave test tone, normalized at 1.