nerfstudio-project / nerfacc

A General NeRF Acceleration Toolbox in PyTorch.
https://www.nerfacc.com/
Other
1.38k stars 112 forks source link

Runtime error in cudaGraphExecUpdate() from tiny-cuda-nn #128

Open morsingher opened 1 year ago

morsingher commented 1 year ago

Hi, I always get a weird error after some thousands of iteration when running this example, or the examples from this other repository:

terminate called after throwing an instance of 'std::runtime_error' what(): /tmp/pip-req-build-z4954kz1/include/tiny-cuda-nn/cuda_graph.h:124 cudaGraphExecUpdate(m_graph_instance, m_graph, &error_node, &update_result) failed with error the graph update was not performed because it included changes which violated constraints specific to instantiated graph update Aborted

After some debugging, I can say that it is not related to tiny-cuda-nn itself, as I can execute smoothly their training example. Also, the error disappears if I just replace the RGB output from your rendering function with random values. I'm using PyTorch 1.13 with CUDA 11.6 and V100 cards. Another weird thing is that this error doesn't show up with Titan Xp cards (and the same PyTorch/CUDA versions).

Do you have any idea why this happens and how to solve it? Thank you in advance!

liruilong940607 commented 1 year ago

Seems like it's an hardware related issue?

I have no clue what could be the cause out of my head. But I believe if you replace the tiny-cuda-nn with a normal mlp, it should not have this issue. If that's the case it would still be somewhat related to tiny-cuda-nn.

Helps needed.