nihui / waifu2x-ncnn-vulkan

waifu2x converter ncnn version, runs fast on intel / amd / nvidia / apple-silicon GPU with vulkan
MIT License
3.01k stars 210 forks source link

Removed slow and unnecessary branching from compute shaders #113

Closed TheJackiMonster closed 2 years ago

TheJackiMonster commented 4 years ago

I removed the branch from the preprocessing and postprocessing compute shaders which were only for convenient indexing and replaced them by a one-liner to calculate the same index for both cases without branching.

This should improve performance by some percentages because code will perform better in parallel without branches.

I also changed the local_size from 32x32x3 to 32x32x1 because GPUs really tend to favor a power of 2 as local_size and the size should fit better to match different channel counts.

kattjevfel commented 4 years ago

For what it's worth I tried this PR and on my test image ran 3 times each, on average this runs 0.13% slower, but that's within the margin of error anyway. At least the output files were identical (hash).

TheJackiMonster commented 4 years ago

It generally depends on the GPU and its drivers. For me it was a little faster in average but theoretically can branches slow down the GPUs scheduler because it can cause the workers to get out of sync.