rahul-goel / fused-ssim

Lightning fast differentiable SSIM.
MIT License
62 stars 3 forks source link

Question regarding do_separable_conv_x #4

Closed cmh1027 closed 1 week ago

cmh1027 commented 2 weeks ago

what's the purpose of recomputation of convolution value when local_y < SY?

rahul-goel commented 2 weeks ago

The workload is launched with block dimension of 32x32. Since the convlution in SSIM uses an 11x11 kernel the size of the window loaded in the shared memory is 42x42.

The separable convolution in X dimension takes as input a window of size 42x42 and outputs a window of size 42x32. (Convolution in X direction reduces X dimension.) This means that there are 42x32 1D separable convolutions happening. Hence, first all 32x32 threads do one convolution. Then only the first 10x32 threads to the remaining 10x32. The statement local_y < SY ensures that only the first 10 threads are doing it in the second phase.

cmh1027 commented 2 weeks ago

@rahul-goel Thanks for reply. One more question please. Why is flush_conv_scratch necessary? Aren't values overwritten anyway without it?

rahul-goel commented 2 weeks ago

I actually didn't correctly explain in the previous comment. I've updated it. Please have a look.

I was doing flushing in an earlier implementation where it was necessary I think. It got carried over from the previous implementation. I haven't checked whether removing it changes things or not and what performance benefits it gives.

rahul-goel commented 1 week ago

I checked. Flushing isn't necessary. Although removing it doesn't make noticeable difference.

Closing this for now. Please re-open if necessary.