Open erazortt opened 5 years ago
Thanks.
For the first part: thSAD parameter should be given as if you give it for 8x8 blocksize of a 8 bit video. Easy to use, you don't have to give different thSADs for similar desired effect, regardless of block sizes. How it works internally: right before the analysis the thSAD values will be recalculated (scaled) to the actual blocksize. Moreover bit depth and even chroma subsampling is involved in calculation when chroma=true is given. Thus a thSAD=400 will become 400(44)*256 for BlkSize=32 (32/8=4 for X and Y as well), and 16 bit video (16 bit/8 bit = 256). For a 4:2:0 video it will be multiplied by an additional 1.5 factor.
As for dct: at modes 1..4 for some reason there is an nBlkSizeX/2 factor which is used when DCTBytes2D is used before. Either this is an empirical constant put there by the original author, Fizick, which seemed to work at times long ago, when practically the only supported blocksize was 8.
Or: after the fftw_execute_r2r transformation the data is really scaled somehow with the 1/horizontal size (?? yes I know, why only horizontal, but who knows?). I should check it with manipulated input block contents to see if there is any difference in the magnitude of resulting blocks. If it does not scale, the nBlkSize/2 should be replaced with 8/2 because blksize 8 was the default since the beginnings. But if the experiment shows that the result somehow is proportional to the 1/nBlkSizeX then it should remain.
I ran into this issue by accident trying out MDegrain with dct=1. To have the degraining strength independent of nBlkSizeX I needed to scale thSAD1, thSAD2 and thSCD1 with nBlkSizeX.
And thanks for the summery I never catched this detail about subsampled inputs. I probably will have to take that into account when processing inputs subsampled diffrently from 4:2:0...
Interesting. Until I check the issue (is the dct 1..4 internal multiplication proper across very different X blocksizes), you can try playing with different thSAD values but that should be I think only a temporary workaround. As I wrote you normally don't have to bother the thSAD scaling, because it purposely should behave similarly for the same thSAD parameter whatever blocksize or subsampling you are using.
My tests have shown that to have the 3 distinct modes of dct working comparably (dct=0, dct=1, dct=5), I needed to multiply the 3 thresholds by the following factors: dct=0 --> factor=1 dct=1 --> factor=nBlkSizeX/2/7 dct=5 --> factor=1/0.7
Or to say it the other way around: The resulting return value for every call to the function PlaneOfBlocks::LumaSADx must be divided by these factors such that the thresholds can stay the same. And this is in fact the way I tested.
dct=5 is a separate algorithm, uses SATD, see https://en.wikipedia.org/wiki/Sum_of_absolute_transformed_differences The factor you found is non constant, SATD is just a different metric of block differences. When someone experiments with those exotic modes, thSAD should be re-established. SAD, SATD and DCT metrics are like apples and oranges. I can see only controversy about DCT'd blocks factor (but it may come from my little knowledge of the deeper transformation matematics used here). DCT is also a different metric, the only question is that it purposely needed *nBlkSize factor or not. I'm sure that Fizick did not put such a thing accidentally there, but of course I'll check it with direct test patterns.
yes, I know all these are different metrics. And while your assessment is of course very valid, the metrics are not different by just factors, it is still helpful to find out by how much they are typically apart in scenarios where one would not expect much visual difference. The frequency domain metrics (DCT and SATD) were introduced because SAD has strong issues with recognizing similar blocks while luminosity changes. Thus in scenes where luminosity is constant one would expect the visual result to be very similar on all three metrics. From my tests with hundreds of such frames I could see that indeed these factors are relatively constant when luminosity does not change. Tuning the factors on bright scenes with constant luminosity will then be useful such that it would not be given in the hand of the user to guess how he needs to change the thresholds just because he decided he wants to change the metric. This is especially critical for the mixed modes (dct=2,3,4,6,7,8,9). As it currently is, these modes mix metrics which have typical values which are completely incompatible. Which makes me wonder if these modes have actually ever been in use as they are now.
To sum up: as long as the different metrics agree on a block being unchanged, they tend to always be apart by the same relative factors. These factors diverge only once the metrics disagree on whether a block is unchanged. The point of the thresholds is to tell when blocks are unchanged. Only in this instance do we need to normalize the metrics. In a mathematical sense this can be called the definition of zero. The metrics must be normalizable for this, since if they would be not normalizable on the definition of zero, these would not be useful metrics. And indeed, they are normalizable, using these factors. When the metrics leave the region of zero (aka they regard a block as changed) of course they are not normalizable anymore but this is also unnecessary.
I wonder if I graph the normal sad vs dct sad pairs for different clip types (normal home video, anime, etc), will I see some kind if clear correlation and establish a formula or formulas, at least around the "zero"? And does it depend on block size and how? (It's clear that dct sad of a full 0 and full 255 valued 8x8 block is zero for all block sizes, not counting the 0th DC element of course. I couldn't see clear correlation for different blocksizes when I created blocks for "synthetic" tests: block1=0,4,8,12,0,4,8,12,etc... block2=4,8,12,0,4,8,12,0... Then tried filling both blocks with random uniformly distributed noise, I got totally different ratios in this case, that's why I want to create a huge dataset to analyse. If you can define your rules I can make a version for you to play with, dct=10.. an up we have plenty rooms :) even with parameters.
Let me sum up my knowlege of what my understanding of the 3 metrics is. Just to be sure we're on the same page here:
Now what I argue is that if one can say that thSCD1=400 (the default and which means something like every pixel changes its value by 6 units) makes sense for metric 1 then it must be possible to have a setting for thSCD1 when using metric 2 or 3 such that the threshold is sensitive to a similar amount of visual difference. I found the factors of typical relative levels of the 3 metrics for block which have not changed (by the thSCD=400 definition) by dumping the outputs of PlaneOfBlocks::LumaSADx for all 3 metrics in hundreds of frames to text files and comaring them. Of course, it would be necessary to make sure that the source does not have a too big influence in this. Which I have to admit, could very much be the case. Anime could turn out to behave very differently to film.
I am not sure whether synthetic measurements with white noise are the way to go. Since by the very defintion of white noise, this means that each block has changes above every definable threshold of similarity. This is basically the regime were one would expect the methods to diverge a lot.
As I said in my previous comments two days ago, the factors can only be expected to be somewhat stable in the regime of "zero" changes. Thus if you want to go on with white noise, I would turn it down in luminosity to a level were one would expect to see usual real life values of noise. Perhaps it is actually interesting to see what happens with the variation of the factors as funtions of the noise intensity.
Oh thinking about it, I can put numbers on how dim the noise must be. The liminosity of the noise must be lower than the thSCD1 setting, such that one would expect the "SAD" metric to remain under the threshold. Thus using the default of thSCD1=400 the luminosity of the noise must be such that the average per pixel difference is below 400/64=6.25
Testing just with white noise is probably a good approximation for anime, which tends to have huge uniform colors. For simulating film would probably be an idea to use the spectral coeffients of DCT itself (link) as backgounds for the white noise.
So here, that would be the idea for a test: Use white noise of a tweakable average luminosity on top of the backgrounds of the DCT coefficients (link) which should also be of a tweakable luminosity. Then one can look at the different results in these three dimensions: background pattern, background luminosity, white noise luminosity. This should be a mathematically complete description for non changing blocks.
Hm I am still stuck on the question whether the 0th component is actually used in DCT. The more I think about this issue the more I come to the conclusion that it is probably used. Since in a way it is a measure of similarity. And if it was discarded the matrix would not be quadratic anymore...
(I'm not lost just busy). White noise was just an experiment, I had to refresh my mind on the whole DCT topic and how it is handled in mvtools. For 8x8 there is a faster, integer DCT algorithm, other blocksizes work in float and use the fftw library.
Float versions have int-float-int conversion, which makes the result similat to the integer dct version. Finally we'll have a signed 8 bit (10,12,14,16 bit) result. DC component is halved, the other components have an 1/sqrt(2) factor. Then the 0th DC component is converted to integer, and shifted by (dctshift+2), the AC components are shifted by dctshift. dctshift is the log2 of blocksizeX*blocksizeY (I have just commented it because the non-mod8 blocksizes suffer from that, have to check it) https://github.com/pinterf/mvtools/blob/mvtools-pfmod/Sources/DCTFFTW.cpp#L91
Then after dctshift0 and dctshift, zero level of the result is moved by adding 128 (for 8 bits). https://github.com/pinterf/mvtools/blob/mvtools-pfmod/Sources/DCTFFTW.cpp#L376 and https://github.com/pinterf/mvtools/blob/mvtools-pfmod/Sources/DCTFFTW.cpp#L394
So in brief: the final integer result (AC component) will have a (1/sqrt(2)) / (blkSizeX BlkSizeY) + 128 conversion. This this not answers the question why there is a *BlkSize/2 factor after SAD-ing the blocks.
Now I can see that you made a serious result dump and the a comparison I wanted to do - I didn't know before, how far you reached.
Shift right by dctshift is the same as division by (blockSizeX blockSizeY), for power-of-2 blocksizes it worked fine but for a 12x12 block it does't. The reason of the normalization is to fit the result to the 8 bit range. Then, after SADing the dct'd blockes, the mul-by-blksizeX-div-2 should be mul-by-sqrt(blockSizeX/2 blockSizeY/2) which would work for non-square blocks. BTW these normalization factors are different when we use FFT3W dct or e.g. matlab dct, it's a free choice depending our practical needs.
But all this keeps the question open how to choose a proper metric which correctly describes the block difference, for detecting best block match or detecting scene change and a usable weight for example mdegrain.
I think I will not have time before Christmas, but then I will try to get at this issue with some serious assessment.
Hey I'm back! So let my try to pick up where we left. Where I'm coming from is that the threshold thSAD and such should be meaning the same independently of what dct is selected. So for an interval of a clip where there are no luminosity changes SAD and SATD should behave exactly the same. To show that there is inddeed a constant factor between them, I am using the following setup. I am degraining a frame using MDegrainN with n=3 and the only thing I am changing are the dct and thSAD setting. Please take a look at the shots: https://drive.google.com/file/d/1eI8Rldh9yrjYd5787oLXvnpaLMYx7POw/view?usp=sharing You will see that the SAD_x shots matche the SATD_x*1.7 shots quite nicely, throughout the whole range which was relevent for this scene of x = 128 to 320. The SATD_x shots all look completly different. This shows that the scaling factor of 1.7 (which is near to my previous estimate of 1/0.7) matches. This is why I am using these scaling factors for my script: http://avisynth.nl/index.php/TemporalDegrain2
Sorry the archive was unreachable, should be fixed now.
Using MDegrain with a dct>0 (set in MAnalyze) the thresholds thSAD1, thSAD2 and thSCD1 are blocksize dependent.
Expectation: Follwing the documentation: "The provided thSAD value is scaled to a 8x8 blocksize." Thus the thresholds should not dependent on the blocksize.
Solution: remove explicit multiplication of the sad value with variable nBlkSizeX in function PlaneOfBlocks::LumaSADx in each case statement