pinterf / Frfun7

Frfun7 is a spatial denoising plugin for Avisynth
GNU General Public License v2.0
17 stars 0 forks source link

Problems when p=1 and tp1>0 #3

Open dubhater opened 2 years ago

dubhater commented 2 years ago

I noticed a missing packuswb here: https://github.com/pinterf/Frfun7/blob/c2b478b2f8672b033fed388e88f9b7e6727e85db/Frfun7/Frfun7.cpp#L611-L626 mm3 contains words and mmA contains bytes, so you can't use psadbw like that. mm3 should be packed first.

But then I noticed that the SIMD code without this packuswb produces the same result as my scalar code. This led me to the other problem: https://github.com/pinterf/Frfun7/blob/c2b478b2f8672b033fed388e88f9b7e6727e85db/Frfun7/Frfun7.cpp#L960-L963 Here you are clipping the returned weight to 0..255 range, but the returned weight is the sum of 16 absolute differences. It's pretty much always bigger than 255. I guess the weight should be divided by 16 before clipping it.

dubhater commented 2 years ago

I think I was wrong about the missing packuswb. The intent seems to be to calculate the SAD between the original pixels and the newly calculated pixels. And later, if the SAD is smaller than P1_param it doesn't bother doing more calculations. If I'm right (this time), mm3 shouldn't be packed before the SAD, instead mm3 right after the unpacking should be used:

AVS_FORCEINLINE void simd_blend_diff4(uint8_t* esi, __m128i &mmA, __m128i mm2_multiplier, __m128i mm1_rounder, __m128i mm0_zero) 
 { 
   auto mm3 = _mm_unpacklo_epi8(_mm_load_si32(esi), mm0_zero); 
   auto mm3_unpacked = mm3;
   // tmp= ((esi << 6) * multiplier) >> 16  ( == [esi]/1024 * multiplier) 
   // mmA = (mmA + tmp + rounder_16) / 32 
   // ((((mm1 << 2) * multiplier) >> 16 ) + 1) >> 1 
   mm3 = _mm_slli_epi16(mm3, 6); 
   mm3 = _mm_mulhi_epi16(mm3, mm2_multiplier); // pmulhw, signed 
   mmA = _mm_adds_epu16(mmA, mm3); 
   mmA = _mm_adds_epu16(mmA, mm1_rounder); 
   mmA = _mm_srli_epi16(mmA, 5); 
   mmA = _mm_packus_epi16(mmA, mm0_zero); // 4 words to 4 bytes 
   *(uint32_t*)(esi) = _mm_cvtsi128_si32(mmA); 
   mmA = _mm_sad_epu8(mmA, mm3_unpacked); // this is the only difference from simd_blend_store4 
 } 
pinterf commented 2 years ago

Thanks for the followup, I see, in https://github.com/dubhater/vapoursynth-frfun7/commit/2f0b961d7b26244e477465c2cf3c4c93156d2b10 you finally kept mmA_unpacked and make the sad between the already unpacked mm3 and unpacked mmA them. (the code in your comment above does not mention mmA_unpacked yet.)

dubhater commented 2 years ago

Ah, sorry. The comment is wrong. https://github.com/dubhater/vapoursynth-frfun7/commit/2f0b961d7b26244e477465c2cf3c4c93156d2b10 is correct.

pinterf commented 2 years ago

Thanks. I'm giving it a try, v0.9 for Avisynth is released.