pygfx / wgpu-py

WebGPU for Python
https://wgpu-py.readthedocs.io
BSD 2-Clause "Simplified" License
443 stars 36 forks source link

Compute shaders produce wrong result when multiplying infinity with a variable #620

Closed wpmed92 closed 1 week ago

wpmed92 commented 1 week ago

Describe the bug

Compute shaders produce wrong result when multiplying infinity with a number, if that number is not known at compile time.

To Reproduce

Reproducible example: https://gist.github.com/wpmed92/c045e98fdb5916670c31383096706406

Observed behavior

The result of the repro compute shader is all zeroes, whereas it should be all infs. (positive number multiplied by positive infinity) If I modify the shader to just pass through pos_inf, the result is correct. If I modify the multiplication to multiply inf by a const, the result is also correct. The result is only incorrect, if I multiply by a variable that is not known at compile time.

Your environment

Chip: Apple M1 Pro OS: 14.4.1 (23E224) Backend: Metal Python version: Python 3.8.10 wgpu version: wgpu 0.18.1

Vipitis commented 1 week ago

inf is handled differently by different GPU vendors and backends. There might also be a case where compute shaders don't at all work with Dx12. Are you using Metal or Vulkan in your case?

wpmed92 commented 1 week ago

I'm using Metal. It happens in Chrome as well. I worked around by passing in the inf as a uniform, and I use that uniform inf in computations. Works fine, but I hoped for a non-workaround solution. Seems like anytime the compiler can deduce that the computation produces inf, it flushes to zero. Context: We're using wgpu in tinygrad for our webgpu backend. It was removed from core, but now we're bringing it back, this is how I bumped into this issue.

Vipitis commented 1 week ago

I gave this a try on Vulkan and D3D12 you get the output to be zero. Reading the spec this is sorta expected behaviour because any intermedia results that are inf might become an "indeterminate" value ref. So it's not something that is meant to work and hence optimized away by the compiler. Your workaround using a uniform buffer might also not work across all GPUs... so I am not sure there will be a solution. Maybe you can look at the naga IR or gpu assembly to see where this gets optimized away and find a trick to avoid it? Perhaps there are other workarounds using push constants or compilation options?

Anyway, I doubt this is something to solve within wgpu-py which just provides the python mapping to wgpu-native C lib.

wpmed92 commented 1 week ago

Thank you for looking into this! Regarding tinygrad I think the uniform based workaround is fine for now. Closing this as I agree it’s not something that can be solved within your lib.

almarklein commented 1 week ago

Hi @wpmed92!

We ran into similar issues earlier, related to detecting inf and nan. We were able to find some tricks to fool the compiler so it won't optimize inf/nan away, so that we can implement our own isinf() and isnan() to detect nonfinite values in incoming data in Pygfx. We added tests here in wgpu-py so we know when out tricks start failing for whatever reason: https://github.com/pygfx/wgpu-py/blob/main/tests/test_not_finite.py

I'd think that similar tricks can be used to get inf values in your shader without using uniforms.

Another option can be the recently added overridable constants, which are a little lighter than uniform buffers.

Vipitis commented 1 week ago

additional reference for playing with the compiler https://shader-playground.timjones.io/ it does have naga, but doesn't show the IR I believe