Closed archibate closed 3 years ago
IIRC, we had a discussion in the very beginning for the OpenGL backend. The conclusion was that the limits of the fragment shader, some of which you've pointed out, probably render it impossible to implement Taichi with pre-4.3 OpenGLs.. (I do see that Halide supports OpenGL without compute shader, but maybe its functional computation pattern doesn't rely so much on atomic ops or strong memory order. OTOH, Taichi is designed to handle mega kernels that has a much richer semantics.)
Mac users who found Metal backend extremely slow;
This is indeed a problem. Fortunately, I think we've identified a poor usage of Metal's memory model, which is going to be fixed in #1415. By having managed memory storage mode + less global float atomics, I'd hope Metal to have some sizable performance boost. E.g. the Zhihu example for calculating PI now runs at ~0.01s
(excluding the first run, because it does the JIT..)
Make it possible to run Taichi on WebGL.
IMO this is definitely an exciting path... I found that WebGL 2.0 claims to support compute shaders (e.g. https://github.com/9ballsyndrome/WebGL_Compute_shader), but I guess it's still very early stage and there could be lots of pitfalls were we to try this path now. Maybe the wise thing is to wait for it to become more mature...
what's the priority?
I think many people have already mentioned that running Taichi in a browser will be very awesome.. Just one random idea, C/LLVM -> WASM?
Thanks for proposing this. I don't think it will be easy as mentioned by @k-ye: fragment shaders have a very limited computational capability.
WebGL 2.0 is still premature but I think LLVM->WASM/JS sounds a reasonable solution to run Taichi in browsers.
Btw, what do we mean by run Taichi in browsers
? Does it mean we can run compiled javascript in browser, or the Taichi python frontend on browser?
Just run the compiled javascript. Basically it's a "player" of pre-compiled Taichi kernels (in JS/WASM).
I think it's still good&possible to have a FS backend even if atomic is not supported, we can have ti.extensions.atomic
on that case.
Some tweaks could be applied to make mpm88 functional on non-atomic backends like OpenGL FS.
Case 1:
for i in x:
x[i] += v[i] * dt
Since all atomic dest are independent, no read, no overlap (i
), we can actually demote this atomic operation in Taichi middle-end.
Case 2:
for i in x:
p = int(x[i] * inv_dx)
grid_v[p] += v[i]
despite the dest location is non-trivial, possible overlapped n-times, but there are no read on grid_v
during this offload.
So instead of accumate x
to grid_v
, we collect grid_v
from x
:
for p in ti.grouped(grid_v):
r = 0.0
for i in range(N):
if x[i] * inv_dx == p: # not sure if there're better method to quick among in space
r += v[i]
grid_v[i] = r
not sure if this is possible to let the middle-end to do the transform... though.
Concisely describe the proposed feature I would like to add a OpenGL fragment shader backend so that my poor laptop NV card could get utilized:
Currently we only support OpenGL compute shader backend, which requires OpenGL 4.3+. But it would be also useful to add a fragment shader that helps:
Describe the solution you'd like (if any) Cons: In fact, fragment shaders even don't support atomic operations... not sure if it's still possible to play it in a GPGPU way. And, there's already an OpenGL compute shader backend, not sure if it's still profitable to add a fragment shader backend with just some poor support..
Additional comments @yuanming-hu @k-ye Do you think this is profitable? If so, what's the priority? If not, feel free to close me without a reason.