Decoding problem on 20-bit sample

ktmf01 commented 3 years ago

First of all, thanks for your hard work on this. I really think you did an amazing job on dr_flac. I already bothered you with #140, but back then I didn't fully appreciate your work.

Recently, I made a FLAC decoder testbench to make the life of FLAC decoder developers such as yourself easier. I ran it through the testprogram under tests/bin/dr_flac_decoding, and all but one tests completed succesfully. As you can see in this wiki that is quite rare by itself.

File 37, being a FLAC file with a 20-bit samplesize, gives me PCM Frame @ 53498[0] does not match: pcmFrameCount=1113323 I looked at the analysis file made by flac -a but I couldn't discover anything out of the ordinary. Perhaps you could look into this?

mackron commented 3 years ago

Thanks for the report and the link to those test files! I have downloaded the test suite and will take a look at this as soon as I get a chance.

ktmf01 commented 2 years ago

I think I've found the cause of this problem. The fix is rather simple but will hamper performance. I'll explain.

In 5 instances in dr_flac, a runtime choice is made between drflac__calculate_prediction_32 and drflac__calculate_prediction_64. The criterion used is bitsPerSample+shift > 32. This is not nearly conservative enough though. libFLAC uses the much stricter bps + subframe->qlp_coeff_precision + FLAC__bitmath_ilog2(order) <= 32, which is perhaps overly pessimistic, but at least very conservative.

There is a very thorough explanation as to why this is here if you would like to dive into this without having to work out all the details yourself.

I could propose a patch, but I'd rather leave the decision up to you. If I change the criterium to bitsPerSample+shift > 31 (or bitsPerSample+shift >= 30) in all 5 cases, this specific test case is solved, but there could be others that will not pass. Using the libFLAC criterion (as explained in the linked text) would hamper performance (even for 16-bit files) but safe. I would personally opt for bitsPerSample+lpcPrecision. lpcPrecision is usually about 2 larger than shift but not available in those parts of the code, so it would require a bit of work to get it there.

edit: actually, in case of the file triggering this problem, lpcPrecision is 4, on occasionally even 5 larger than shift.

mackron commented 2 years ago

Thanks for digging into this. That all makes sense. I think the most important thing in terms of performance is that the majority of 16-bit files are run through the 32-bit pipeline. It's OK on rare occasions if they run through the 64-bit path, but so long as it's more of an edge case thing I think it's fine.

I'll experiment with this and process my library of 16-bit FLAC files through a test program and see how often they hit the 64-bit path with and without the ilog2() part. If it's rare enough, might as well just be safe and use ilog2() as well I guess.

ktmf01 commented 2 years ago

The libFLAC encoder has special provisions to make encoded files that keep the libFLAC decoder in the 32-bit datapath by restricting the maximal lpcPrecision used when dealing with subframes with bit-per-sample 17-bit or lower (to account for the extra bit in side frames). So, if your FLAC library was encoded with libFLAC, you are unlikely to trigger the 64-bit datapath with 16 bps files.

Other encoders however, like ffmpeg, do not do this and default to the max lpcPrecision of 15, so adding the ilog part will trigger the 64-bit datapath often for 16 bps FLAC files encoded with ffmpeg.

mackron commented 2 years ago

That's good to know about ffmpeg. Wasn't aware of that. I'm not sure what to do now...

ktmf01 commented 2 years ago

I understand this is a difficult problem. The choice is between a faster approach that might fail a few very extreme samples or one that is guaranteed to work but is slower. Perhaps using a little more intelligent approach here wouldn't hurt. I just thought of something, I will explain.

The document I linked earlier explains why there is a runtime choice to be made between a 32-bit and a 64-bit datapath: we don't want accumulator used in calculating the dot product of the past samples and the linear prediction coefficients to overflow. libFLAC makes this runtime decision by using a worst-case scenario: if we combine the worst possible vector of past samples and the worst possible vector of lp coefficients, how many bits do we need? That is calculated by bps + subframe->qlp_coeff_precision + FLAC__bitmath_ilog2(order).

However, this approach treats both the past samples as well as the lp coefficients as unknown, which is not the case. Scanning all samples would take to much time, but analysing the lp coefficients isn't too much of a processing burden, so a more intelligent approach could do that.

Instead of assuming the lp coefficients are "maxed out", we can just assume all past samples are. Instead of subframe->qlp_coeff_precision + FLAC__bitmath_ilog2(order) we can take ilog2 of the sum of the absolute values of the lp coefficients. That way the code doesn't have to be changed to propagate lpcPrecision. Presume we add the following to drflac__decode_samples_with_residual__rice__scalar

drflac_uint32 sumAbsCoefficient = 0;
drflac_uint32 ilog2SumAbsCoefficient;

for (i = 0; i < order; i++) {
    sumAbsCoefficient += abs(coefficient[i]);
}
ilog2SumAbsCoefficient = (32 - drflac__clz(sumAbsCoefficient));

if (bitsPerSample+ilog2SumAbsCoefficient > 32) {
/*use 64-bit accumulator */
}else{
/*use 32-bit accumulator */
}

I real world cases this will often save 1 bit, sometimes even 2 bit compared to the libFLAC criterion on LPC subframes with order 12.

ktmf01 commented 2 years ago

~Code in my last post was still off by one bit. This seems better~

drflac_uint32 ilog2SumAbsCoefficient = 0;

~and~

if (sumAbsCoefficient > 0) {
 ilog2SumAbsCoefficient = (31 - drflac__clz(sumAbsCoefficient));
}

edit: nevermind, I was only thinking of powers of two. The code in the previous post is correct, however, the name ilog2SumAbsCoefficient isn't. ilog2SumAbsCoefficientPlusOne would be more correct

mackron commented 2 years ago

Sorry for not responding sooner. As a start, lets get the bug fixed and use the ilog2() method from libFLAC. After that we can investigate your proposal and treat it as an optimization. I'm hoping to get to this over this weekend.

mackron commented 2 years ago

I've pushed a change to the dev branch to fix this. Are you able to give that a try?

ktmf01 commented 2 years ago

Yes, 713ffe6 fixes it. In the meantime created even more extreme files with an experimental FLAC encoder to trigger bugs like these in other decoders, which dr_flac decodes fine now.

https://github.com/ktmf01/flac-test-files/blob/main/subset/62%20-%20predictor%20overflow%20check%2C%2020-bit.flac https://github.com/ktmf01/flac-test-files/blob/main/subset/63%20-%20predictor%20overflow%20check%2C%2024-bit.flac

mackron commented 2 years ago

Thanks for confirming that fix for me. I've released this to the master branch. I'll leave this ticket open to remind to investigate your optimization proposal.

mackron commented 2 years ago

I was having a think about this, and to be honest I don't have much interest in optimizing this any further. At least not right now. I'm going to go ahead and close this ticket. If there's significant demand from the community in the future, which I highly doubt, I may revisit this.

Thanks for the report and taking the time to investigate all of this. Also, I'm now using your test suite for my automated testing tool, so thanks a lot for your work maintaining that as well.

ktmf01 commented 2 years ago

And of course many thanks to you for maintaining dr_flac. I think it is a very valuable addition to 'the FLAC ecosystem'. There are obviously a lot of places where libFLAC or libavcodec are way too bulky to use.

FYI, I've added it to a list of FLAC implementations here

mackron commented 2 years ago

Thanks! Quick correction - dr_flac does not support MP4 encapsulation. Just FLAC and Ogg.

ktmf01 commented 2 years ago

Huh, I don't what went wrong there.

mackron / dr_libs

Decoding problem on 20-bit sample #202