Open bchetioui opened 1 year ago
A workaround is to store temporary values to the global memory and then reload from it.
Great, thanks! That unblocks me for the time being.
@bchetioui Can you show me final code?
Sorry @delibae, I saw your message and couldn't reply at the time---and then forgot about it. Unfortunately, I do not have code that I can share for this at the moment. Do you still need it?
A workaround is to store temporary values to the global memory and then reload from it.
i tried to use tl.save with tl.load to bypass this issue it can compile and run but it seems that the saving doesn't work properly the stride info is not processed correctly so that some columns are missing
I am trying to reimplement Praxis's dot product attention with a lazy broadcast prefix.
My code is the following:
Unfortunately, I encounter the following error when attempting to compile on A100 with Triton at HEAD---apparently, pretty much independently of the input shapes:
I read from issue #1298 that this may happen when the optimizer doesn't do its job well. Is there a known workaround that might work in this case? If not, I'd be happy to help with debugging this if someone could point me in the right direction :)