Some tracks are clipping at unity

mixxxbot commented 2 years ago

Reported by: daschuer Date: 2015-01-06T16:12:00Z Status: Fix Released Importance: Medium Launchpad Issue: lp1408011 Tags: soundsource

Some of my mp3 tracks are clipping if Mixxx is at unity. No EQs, No effects, Gain and master gain at 1 and ReplayGain disabled Default boost = 0 dB.

I cannot reproduce the issue with a sine wave test tone, normalized at 1.

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T16:25:55Z

Confirmed. Clipped values are showing up as 1.00003

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T16:26:18Z

(on a particular example file)

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T16:28:11Z

The range of libmad samples is -8.0 to +7.9999999962747097015380859375 So it is probably no bug?

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T16:36:47Z

the conversion function is broken: -32768 gets converted to -1.00003 instead of -1.0

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T16:37:54Z

This is because SHRT_MIN != -(SHRT_MAX)

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T16:45:42Z

No, it is properly clipped to 16 bit in SoundSourceMp3::madScale()

What a pity: a fix is In progress Bug #⁠1156569

mixxxbot commented 2 years ago

Commented by: rryan Date: 2015-01-06T17:04:04Z

Nice catch Owen!

Daniel, does fixing SampleUtil::convertS16ToFloat32 fix your clipping track as well?

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T17:15:03Z

Here is one:

https://github.com/mixxxdj/mixxx/blob/873835442c437059f0bb0984862c73fa9f937422/src/sampleutil.cpp#L179

void SampleUtil::convertS16ToFloat32(CSAMPLE* pDest, const SAMPLE* pSrc,
        unsigned int iNumSamples) {
    for (unsigned int i = 0; i < iNumSamples; ++i) {
        pDest[i] = CSAMPLE(pSrc[i]) / CSAMPLE(SAMPLE_MAX);
    }
}

0x8000 / 0x7FFF = 1.000030519

But I think we have an issue in madScale(), since it clips the value to unsymmetric positive and negative range. .. And I think we have the root issue that madScale() clips at all.

Conclusion, there are mp3s out there with samples > 1. So fixing this by adding a 1 in certain places will only hide the clipping from the user, which is not a nice idea.

@Uwe: In what condition is your floating point API fix? Can we be brave and commit it to 1.12.

.. or should we just postpone this bug to 2.1?

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T17:22:07Z

https://github.com/mixxxdj/mixxx/pull/453

In portaudio, pa_convert.c, they convert from short to float by dividing by 32768. They convert from float to short by multiplying by 32767.

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T17:22:46Z

(sorry daniel, I didn't see your note until I posted mine)

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T17:22:59Z

Daniel, does fixing SampleUtil::convertS16ToFloat32 fix your clipping track as well?

Yes, but this is only a band Aid. IMHO we should not fix it in convertS16ToFloat32, since a 0x8000 value is actually clipping, because the positive counterpart is missing.

We may fix the clamping in madScale() to +-0x7fff or apply Uwes patch.

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T17:30:10Z

Daniel I think the range really is -32768 to 32767, there is more db on the downswing than on the upswing. so there is an absolute value for which the downswing is not clipping, but the upswing is. Isn't that why the libmad range is -8.0 to +7.999? Or are you saying that unbalanced range is our code?

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T18:27:13Z

I would prefer not to apply Uwe's patch if we can solve this in a more focused way. I have done some reading on the mp3 spec and it's not totally clear if the range is asymmetric or not. Although the madlib code mentions an asymmetric range, the precision seems higher than short ints can support. So I'm starting to lean toward fixing the madscale function as you suggest.

RJ what do you think?

mixxxbot commented 2 years ago

Commented by: rryan Date: 2015-01-06T18:33:26Z

MP3 and other formats whose internal representations aren't samples are probably more free to get around the issues of two's complement number representations but formats like WAV that are based on signed values internally have a discrete set of values they can represent and the SoundSource API as it stands is built around that with our signed 16-bit integer read methods.

@Daniel -- I don't think that it's hiding clipping. -32768 is a valid number in s16 PCM.

The conversion from S16 to [-1.0,1.0] float in Mixxx is a conversion of the representable numbers, not a signal conversion -- it should be agnostic of the underlying signal. It's the codec's job to produce a signed 16 number that represents the signal as best as the codec can. For WAV -- that's really easy since it matches the internal representation. For MP3 -- it's not a perfect match. That's why we'll get rid of this in a future version so that each codec can convert to float in the best way it can.

I'm also against Uwe's patch for 1.12.0 -- as much as I really really want it! It's too risky IMO.

If there is also an issue in madScale then let's fix that too. I think Owen's fix to convertS16ToFloat32 is warranted.

mixxxbot commented 2 years ago

Commented by: rryan Date: 2015-01-06T18:41:23Z

If we were to do the representable number conversion "right" (so that -1 corresponds to the minimum representable value and 1 corresponds to the maximum representable value) we would divide negative numbers by SHRT_MIN and positive by SHRT_MAX but that will slow down the loop.

As Owen points out, PortAudio has similar assumptions built in around the range of signed 16-bit ints.

int16 -> float -- divides every number by 32768
float -> int16 -- multiplies every number by 32767

I assume this is a similar performance optimization to prevent clipping in either conversion without having to do comparisons in the loop. I'll write to portaudio-dev to ask Ross.

mixxxbot commented 2 years ago

Commented by: rryan Date: 2015-01-06T19:10:19Z

Looking at how others manage converting libmad's fixed point numbers to s16 -- I think our madScale function is fine.

madlld: https://github.com/bbcrd/audiowaveform/blob/master/src/madlld-1.1p1/madlld.c#L165

madplay: https://github.com/Distrotech/madplay/blob/master/audio.c#L436

It seems our madScale method is right out of madplay: https://github.com/Distrotech/madplay/blob/master/audio.c#L213

Except our clipping is slightly different: https://github.com/Distrotech/madplay/blob/master/audio.c#L176

We don't do the "signal statistics" that madplay does (which conditionally applies the clipping?). Not sure I understand why that is -- -- perhaps there's a post-processing scaling based on the signal statistics?

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T19:36:25Z

IMHO Portaudio does it just right. For me -32768 is an invalid value, because the positive counterpart is just clipped. libmad uses Integer math and this would be just hard if we have a different scale on the positive and on the negative side of an integer only because of one bit. But since -32768 can actually happen, port audio does its job right wen using 32768 from 16 bit to float and 32767 to 16 bit. So no one can blame port audio for clipping.

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T20:07:49Z

The same happens with m4a tracks as well.

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T20:20:40Z

Fixed https://github.com/mixxxdj/mixxx/pull/453

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T20:24:39Z

In my test, the raw value range of libmad samples is: -321279529 .. 308258210 -1.1968595 .. 1,148351319

Which mean massive clipping

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T20:45:41Z

How did you test to get these values?

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T20:49:22Z

Testes a couple of tracks and now I am at -343734695 .. 351910901 -1.28 .. 1.31

Thins this bug is solved, we need a new one

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-01-06T20:59:36Z

please make those available too

mixxxbot commented 2 years ago

Commented by: rryan Date: 2015-01-06T21:24:07Z

IMHO Portaudio does it just right. For me -32768 is an invalid value, because the positive counterpart is just clipped.

I don't think it's clipped, it's lower resolution because there are fewer possible positive discrete values.

Let's say we have a function generator that produces a 5V peak to peak sine wave.

Now capture a digital representation of this signal using an ADC (like [1]) that produces 2's complement 16-bit numbers (as a sound card / capture card would). 32767 represents 5V on the positive side and -32768 represents -5V on the negative side. 32768 is an invalid value since it would be above the reference voltage. 32767 represents the maximum positive voltage. No clipping has occurred.

The limitation of discretizing the signal into an even number of buckets requires you to choose between having no zero value (impractical for digital audio since you can't represent silence) or giving one side more resolution. It's the same problem with 7-bit MIDI knobs.

Looking at some DACs (e.g. [2]) -- they also speak two's complement. So no resolution issues arise if you feed the output of an ADC into a DAC since both have allocated one more discrete value of resolution to the negative side since they use two's complement.

With full knowledge that we are sitting on top of a skyscraper of abstractions between Mixxx and the hardware, if everyone passed through their signed 16-bit samples then it should "just work" and the asymmetric resolutions wouldn't hurt anyone. However, we use DSP algorithms to process the signal that don't account for the mismatch. This could lead to minor distortions -- which is why the right thing to do would be to adjust for the asymmetry when we convert to floats.

Since Mixxx does all of its processing using [-1, 1] floats, the value range [-1, 0) should capture the entire range of negative values linearly. Similarly the value range (0, 1] should capture the entire range of positive values linearly.

We are choosing to accept this inaccuracy as a performance optimization -- which seems to be a fine trade-off that others (e.g. PA) also do!

It's been a while since I used an ADC / DAC directly. Maybe they behave differently than I think?

[1] http://www.analog.com/static/imported-files/data_sheets/AD1877.pdf [2] http://www.analog.com/static/imported-files/data_sheets/AD1852.pdf

mixxxbot commented 2 years ago

Commented by: rryan Date: 2015-01-06T21:30:32Z

Interesting, looks like PortAudio has discussed this at length and have plans to potentially change what they do: https://www.assembla.com/spaces/portaudio/tickets/100#/activity/ticket

Bjorn has written a couple of useful blog posts on the topic: http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html

Based on this post, Bjorn recommends scaling the negative side and positive side asymmetrically: http://music.columbia.edu/pipermail/portaudio/2009-December/009709.html and confirms that harmonic distortions occur with other methods.

Ross wrote a response: http://music.columbia.edu/pipermail/portaudio/2009-December/009715.html

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T22:56:05Z

I have never worked with 2's complement 16-bit numbers ADCs/DACs, but it sounds scary that they should have a different scale for negative and positive side. I am afraid this question is practical not important because it will fall below the non linearity issue at the signal borders and the offset error.

http://www.ti.com/lit/an/sbaa042/sbaa042.pdf Explains the issue explicit. see BTC — BINARY TWO’S COMPLEMENT A -5 V.. +5 V 4 bit coverter has digital value representation from +5 .. - 4,3

Everything else would make the date useless for doing math, without a rescaling.

I cannot find this explicit written in the Analog Devices data shed, so maybe the think about different.

mixxxbot commented 2 years ago

Commented by: daschuer Date: 2015-01-06T23:01:17Z

Analog devices does the same like TI for this device: http://www.analog.com/static/imported-files/data_sheets/AD7776_7777_7778.pdf Se Figure 5. one value at the positve end is skipped.

mixxxbot commented 2 years ago

Commented by: rryan Date: 2015-01-07T03:47:40Z

Lots of good discussion from 2004: http://music.columbia.edu/pipermail/portaudio/2004-April/003516.html

Nice find RE: those spec sheets. It looks like the world of DAC/ADCs is a mixed, often undocumented bag in this regard (lots of discussion in that thread about various spec sheets).

mixxxbot commented 2 years ago

Commented by: uklotzde Date: 2015-01-07T15:19:47Z

See my comment about decoding of MP3: https://bugs.launchpad.net/mixxx/+bug/1408100/comments/16

The PR for the floating-point API is huge although the modifications that are needed to integrate it into the Mixxx code base are just a few. I did a rewrite of most SoundSources, because the code I found was rather ugly, inefficient, and error prone. It's still far from perfect, but I'm pretty confident now ;) I'm frequently rebasing the NewSoundSourceAPI branch on the current master branch, so feel free to test it.

I'm already using the new API for my private version on Fedora without any issues so far (m4a, mp3, flac), but the Windows and OS X ports are still untested.

mixxxbot commented 2 years ago

Commented by: uklotzde Date: 2015-01-08T08:19:19Z

The existing 16-bit integer SoundSource API does not provide any headroom for the decoded output. See me comments here: https://bugs.launchpad.net/mixxx/+bug/1408100/comments/19

The new floating-point API will fix this.

mixxxbot commented 2 years ago

Issue closed with status Fix Released.

mixxxdj / mixxx

Some tracks are clipping at unity #7797