Closed chinmay0301ucsd closed 8 months ago
We discussed this off-channel, but I'll paste the solution here: for loops with iteration count known at compile-time, use [ForceUnroll]
instead of [MaxIters(N)]
for a much more performant kernel.
I just want to check that this resolves the extremely long run-time issue.
Yes, that resolves the issue. Closing it.
Hi, I wrote some code in slang, which seems to execute the forward pass correctly. But when I execute the backward pass, GPU utils goes 100%, and system hangs. Specially when I try to print the derivative value. I am using slangpy as the python interface to call the slang kernel file (shown below).
In the python code,
When I call the .bwd function, it takes extremely long, and returns all 0s and NaNs. But the forward pass works correctly. My guess is that there's something wrong with how the derivatives are being chained through the
MatrixG
data structure.