xiph / flac

Free Lossless Audio Codec
https://xiph.org/flac/
GNU Free Documentation License v1.3
1.62k stars 277 forks source link

Cases where -e makes compression worse; does -e really do everything it should? #728

Open H2Swine opened 2 months ago

H2Swine commented 2 months ago

Usual reservation: I do not know whether -e is even supposed to handle this, and even if it is, I am not saying it is worth the effort of "fixing" it. Rather, I am reporting it because it could be symptom of a bug with bigger consequences. (At "worst", waking -e up from the dead ...)

But the situation is, sometimes I encounter signals where -e even makes for bigger encode. I found a near-silent file where it happened, and got it down to the first 16384 samples, mono. That is most manageable, so I ran it through -l <0 to 32>, -b <1024, 2048, 4096, 8192, 16384> -r <n,n for n=0 to 9> and then with/without -p and -e; that is 33510*4 = 6600 files. Or 3300 pairs "-e or not".

I think 256 of them did worse with -e. That's more than 7.5 percent. (For a further 196 pairs, they were the same size. I didn't check whether those were bit-identical.) And it seems that when -e does worse, the "no-e" uses prediction order 1 while -e uses more.

Surely there are a lot of duplicates among those 3300; fewer than you might think because I used -r n,n forcing partition order, rather than -r n. The filenames indicate setting used, and the duplicates after the equals sign. Examples: 16384samples-b2048-l02-r¨09.flac=-l03.flac means it was encoded with -b2048 -l2 -r9,9 and is bit-identical to the file encoded with -b2048 -l02 -r9,9 16384samples-b1024-l04-r¨00-e.flac=-l_up_to_32.flac means it was encoded with -b1024 -l4 -r0,0, and is bit-identical to all files with -l up to 32 and everything else equal.

where_-e_loses.zip

All done with a recent compile from git. Discovered it with 1.4.x.

ktmf01 commented 2 months ago

Without having done any tests, I assume this has to do with the inexact residual bits calculation.

The code has two methods of calculating the size of the residual, which one is used is a compile time option, not a run time option. When running with exhaustive model search (-e), the program chooses which order (i.e. which model) to use based on how many bits it thinks the residual and the model together are going to take up. As this is inexact, it can choose a model that does not result in the smallest subframe.

Maybe these signals can help to find a better fast way to estimate the amount of bits needed.

ktmf01 commented 2 months ago

Here are two Win64 compiles to experiment with, one standard, one with exact rice calculation turned on: ~flac-test-exact-rice.zip~

H2Swine commented 2 months ago

Trying those compiles, I got system errors that libwinpthread-1.dll was not found and then that libogg-0.dll was not found.

ktmf01 commented 2 months ago

Ah, yes, I didn't do the usual checks.

Attempt no. 2: flac-test-exact-rice.zip

H2Swine commented 2 months ago

Here are a couple of "really bad" examples in that -e increases size by fifty percent, both on the "exact" build and the "inexact" build: b1024-r9,9.zip ran with --no-padding -b1024 --lax -r9,9 with and without -e. -r9,9 means out of subset, but for -r8,8 the files were still like 15% bigger with -e than without. Of course you can argue that -r n,n is "merely a pathological setting that nobody uses"; -e improves size when -r8 is used.

For those parameters where -e lost in the previous rounds (highly biased selection!), ran with these two builds, the -e outcompresses the --no-exhaustive-model-search more often in the "inexact" build. But for the "exact" build, -e loses most settings that it lost in the previous round.

Edit: this sounds dramatic. Note again to other readers, this is only 16384 samples with values -1, 0, 1. Issue opened just in case it is a symptom of something.

H2Swine commented 2 months ago

The plot thickens, and I do suspect it is a bug. It is very much related to (dual) mono, and it is apparently new with flac 1.4.

I discovered it on the "near-mono" tracks of Miles Davis "The Complete Birth of the Cool"; the 1998 edition, but taking only the first eleven tracks that appeared on the original Birth of the Cool LP (which apparently served as a master!). For some strange reason the channels differ in the LSB only, looks like they were independently dithered. (Which was why I got curious enough to look into those; the live tracks are truly "mono as stereo", the side channel is zero.)

Here is what happens - the audio is available on request, but I cannot upload it here.

Sizes with --no-mid-side, 1.4.3, smallest to largest:

197050659 Miles1to11-143.-8p--no-mid-side.flac
197063494 Miles1to11-143.-8ep--no-mid-side.flac
197083289 Miles1to11-143.-7p--no-mid-side.flac
197089542 Miles1to11-143.-7ep--no-mid-side.flac
197133685 Miles1to11-143.-8--no-mid-side.flac
197148500 Miles1to11-143.-8e--no-mid-side.flac
197163269 Miles1to11-143.-7--no-mid-side.flac
197174153 Miles1to11-143.-7e--no-mid-side.flac
197317528 Miles1to11-143.-6p--no-mid-side.flac
197331275 Miles1to11-143.-6ep--no-mid-side.flac
197346712 Miles1to11-143.-5p--no-mid-side.flac
197347389 Miles1to11-143.-4p--no-mid-side.flac
197357560 Miles1to11-143.-5ep--no-mid-side.flac
197358339 Miles1to11-143.-4ep--no-mid-side.flac
197383143 Miles1to11-143.-6--no-mid-side.flac
197397815 Miles1to11-143.-6e--no-mid-side.flac
197413525 Miles1to11-143.-5--no-mid-side.flac
197414102 Miles1to11-143.-4--no-mid-side.flac
197425671 Miles1to11-143.-5e--no-mid-side.flac
197426321 Miles1to11-143.-4e--no-mid-side.flac
197920145 Miles1to11-143.-3p--no-mid-side.flac
197925004 Miles1to11-143.-3ep--no-mid-side.flac
197971049 Miles1to11-143.-3--no-mid-side.flac
197977252 Miles1to11-143.-3e--no-mid-side.flac

It should be noted that each of these settings improve over 1.3.4:

197103898 Miles1to11-134.-8ep--no-mid-side.flac
197108320 Miles1to11-134.-8p--no-mid-side.flac
197138650 Miles1to11-134.-7ep--no-mid-side.flac
197155596 Miles1to11-134.-7p--no-mid-side.flac
197192610 Miles1to11-134.-8e--no-mid-side.flac
197201804 Miles1to11-134.-8--no-mid-side.flac
197226583 Miles1to11-134.-7e--no-mid-side.flac
197240522 Miles1to11-134.-7--no-mid-side.flac
197360756 Miles1to11-134.-6p--no-mid-side.flac
197364644 Miles1to11-134.-6ep--no-mid-side.flac
197417984 Miles1to11-134.-5ep--no-mid-side.flac
197418760 Miles1to11-134.-4ep--no-mid-side.flac
197432027 Miles1to11-134.-6--no-mid-side.flac
197435197 Miles1to11-134.-6e--no-mid-side.flac
197477031 Miles1to11-134.-5p--no-mid-side.flac
197477713 Miles1to11-134.-4p--no-mid-side.flac
197486494 Miles1to11-134.-5e--no-mid-side.flac
197487190 Miles1to11-134.-4e--no-mid-side.flac
197542614 Miles1to11-134.-5--no-mid-side.flac
197543246 Miles1to11-134.-4--no-mid-side.flac
197958871 Miles1to11-134.-3ep--no-mid-side.flac
197977721 Miles1to11-134.-3p--no-mid-side.flac
198010968 Miles1to11-134.-3e--no-mid-side.flac
198028602 Miles1to11-134.-3--no-mid-side.flac

Allowing for stereo decorrelation, then -e works as should:

120050974 Miles1to11-143.-8ep.flac
120055033 Miles1to11-143.-8p.flac
120063818 Miles1to11-143.-7ep.flac
120071018 Miles1to11-143.-7p.flac
120091527 Miles1to11-143.-8e.flac
120095166 Miles1to11-143.-8.flac
120104345 Miles1to11-143.-7e.flac
120109975 Miles1to11-143.-7.flac
120185424 Miles1to11-143.-6ep.flac
120189074 Miles1to11-143.-6p.flac
120197902 Miles1to11-143.-5ep.flac
120203056 Miles1to11-143.-5p.flac
120204126 Miles1to11-143.-4ep.flac
120209398 Miles1to11-143.-4p.flac
120216340 Miles1to11-143.-6e.flac
120220068 Miles1to11-143.-6.flac
120230515 Miles1to11-143.-5e.flac
120235421 Miles1to11-143.-5.flac
120235756 Miles1to11-143.-4e.flac
120240837 Miles1to11-143.-4.flac

Curiously, for 1.2.1 (and 1.1.4), -p did misbehave (with or without --no-mid-side). -4 to -6:

120255935 Miles1to11-121.-6e.flac
120255961 Miles1to11-121.-5e.flac
120267836 Miles1to11-121.-6.flac
120267848 Miles1to11-121.-5.flac
120274039 Miles1to11-121.-4e.flac
120307066 Miles1to11-121.-4.flac
120312827 Miles1to11-121.-6ep.flac
120312864 Miles1to11-121.-5ep.flac
120335473 Miles1to11-121.-4ep.flac
120385579 Miles1to11-121.-5p.flac
120385582 Miles1to11-121.-6p.flac
120454126 Miles1to11-121.-4p.flac

The uncompressed CDDA .wav is 345320684 bytes.

Extracting a channel to a mono file and compressing it, works like --no-mid-side. Indeed, that is how I discovered it.

ktmf01 commented 2 months ago

Here are a couple of "really bad" examples in that -e increases size by fifty percent, both on the "exact" build and the "inexact" build: b1024-r9,9.zip ran with --no-padding -b1024 --lax -r9,9 with and without -e. -r9,9 means out of subset, but for -r8,8 the files were still like 15% bigger with -e than without. Of course you can argue that -r n,n is "merely a pathological setting that nobody uses"; -e improves size when -r8 is used.

Maybe I misunderstood your naming scheme, but the files with -e at the end are smallest. Are these the files with e or specifically without e?

H2Swine commented 2 months ago

Embarrassing mistake, and you are right.

Did it over again with that particular short signal.

Tab-separated file: exact and inexact builds -e and -p impact.txt Includes lots of dumb option combinations too, and going -l that high was no use, but I left them in because it would be more work deleting. I did delete the -b8192 run that produced no "bad" situations.

H2Swine commented 1 month ago

A few more tests on the "Miles tracks 1 to 11" set confirms that the exact-rice "fixes it": With that build, -e always improves, and same for -p.

I got only one instance (or two) where -p worsens: -5r6 -l1 with/without -e.

Leaving -p aside and focusing on -e: For the "problem" (to the extent that 0.01 percent is a "problem") to manifest, I need to use --no-mid-side. Recall that the signal is near-mono: the difference channel ranges -3 to 3 or something. So -e makes for enough impact on those small numbers to offset the bad impact on the large numbers.

I ran

and varying the following parameters (yeah there was a --lax always):

With the inexact-rice build and --no-mid-side, -e makes worse for all runs except "-l0" and "-l1" in the "varying -l" section.

Attaching some results (the cases with "undesired impact" are farthest down) even if the cause is kinda found. Miles1..11varying-l.txt Miles1..11varying-r.txt Miles1..11varying-7A_varying_subdivide_tukey.txt The leftmost column explains settings; the "f" and even "ff" at the end is nothing deeper than for getting filenames collated consistently. I FOR-looped a variable ranging (ef,ff,pe,pf).