Open HatsuneMiku888 opened 1 year ago
Hi,
commenting that line as you did will significantly change the math of the gradient computation and should give you very bad results. We are currently at Siggraph, but when we get back we will see what we can find from the .dump you shared.
Thanks for your reply! I know commenting that line can't be a final solution. It just to locate where the things going wrong. I mean the backpropagation can passed successfully under the same input without that line.
The same problem appears to me, there are 3 issues for the invalid memory now, and none of them can work out... could someone help? thanks!
The same problem appears to me, there are 3 issues for the invalid memory now, and none of them can work out... could someone help? thanks!
Hi @ray8828 , if you have that issue can you post your hardware setup and the .dump
for when the crash occurred? Creating the dump file requires running with --debug
@HatsuneMiku888 I finally had the time to look at your output. It seems that you are using both Python-computed covariance matrices and colors (--convert_SHs_python
and --convert_cov3D_python
are active), any particular reason for this? We left those paths in for compatibility and experimenting, they are not heavily tested.
@HatsuneMiku888 I found the line that causes the crash. Unfortunately, I have no explanation:
For some reason, a point ID with a number that is way too high gets into the list of points to render. Unfortunately, I don't know how I could debug this without extensive access to the machine that it happens on. We could set this up, but it will take a while before I have time to do this. From the dump alone I have no idea how this could occur. Is it reproducible? Does it also happen when the two options I mentioned above are turned off?
Last but not least, also for @ray8828 , another user has set up a Colab that seems to successfully run the code base on T&T. This could hopefully reduce issues with local project setups, so maybe this will work out for you https://github.com/camenduru/gaussian-splatting-colab
I have met the same problem, after commenting out the line mentioned above, the code works well.(https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/backward.cu#L503 )
I have met the same problem, after commenting out the line mentioned above, the code works well.(https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/backward.cu#L503 )
Hi,
please note that this is not a fix, it will completely break the math behind the approach. If you continue to have issues with running it, please consider using the Colab linked on the main page.
@HatsuneMiku888 I found the line that causes the crash. Unfortunately, I have no explanation:
For some reason, a point ID with a number that is way too high gets into the list of points to render. Unfortunately, I don't know how I could debug this without extensive access to the machine that it happens on. We could set this up, but it will take a while before I have time to do this. From the dump alone I have no idea how this could occur. Is it reproducible? Does it also happen when the two options I mentioned above are turned off?
Last but not least, also for @ray8828 , another user has set up a Colab that seems to successfully run the code base on T&T. This could hopefully reduce issues with local project setups, so maybe this will work out for you https://github.com/camenduru/gaussian-splatting-colab
1073280485 is very close to 2^30, maybe there are some numeric overflow?
@HatsuneMiku888 how good is your Python? Could you force it to create the snapshow_fw.dump
of the forward pass (even tho it doesn't fail) for the frame where the backward fails and forward it to us?
I have met the same problem, after commenting out the line mentioned above, the code works well.(https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/backward.cu#L503 )
Hi,
please note that this is not a fix, it will completely break the math behind the approach. If you continue to have issues with running it, please consider using the Colab linked on the main page.
Thank you for your reply. I know that is not a fix. I am trying to locate the bug, this error occurs at different iterations when I use different data.
@HatsuneMiku888 how good is your Python? Could you force it to create the
snapshow_fw.dump
of the forward pass (even tho it doesn't fail) for the frame where the backward fails and forward it to us?
Sure, I will attempt to reproduce this error on the machine where it occurred.
Btw, now I have a new problem. I faced the same Illegal memory access
error during the forward training process on other dataset. But the error miraculously disappeared when I executed _C.rasterize_gaussians
using snapshot_fw.dump
as parameters in a separate script.
@HatsuneMiku888 I found the line that causes the crash. Unfortunately, I have no explanation:
For some reason, a point ID with a number that is way too high gets into the list of points to render. Unfortunately, I don't know how I could debug this without extensive access to the machine that it happens on. We could set this up, but it will take a while before I have time to do this. From the dump alone I have no idea how this could occur. Is it reproducible? Does it also happen when the two options I mentioned above are turned off?
Last but not least, also for @ray8828 , another user has set up a Colab that seems to successfully run the code base on T&T. This could hopefully reduce issues with local project setups, so maybe this will work out for you https://github.com/camenduru/gaussian-splatting-colab
Hello,I have the same error. And I want to know how to debug the cuda code in gaussian-splatting.I just know how to debug the python file.
@HatsuneMiku888 I found the line that causes the crash. Unfortunately, I have no explanation: For some reason, a point ID with a number that is way too high gets into the list of points to render. Unfortunately, I don't know how I could debug this without extensive access to the machine that it happens on. We could set this up, but it will take a while before I have time to do this. From the dump alone I have no idea how this could occur. Is it reproducible? Does it also happen when the two options I mentioned above are turned off? Last but not least, also for @ray8828 , another user has set up a Colab that seems to successfully run the code base on T&T. This could hopefully reduce issues with local project setups, so maybe this will work out for you https://github.com/camenduru/gaussian-splatting-colab
Hello,I have the same error. And I want to know how to debug the cuda code in gaussian-splatting.I just know how to debug the python file.
do you know the result,thank you
Hi, I experienced
RuntimeError: an illegal memory access was encountered
when I train 3d gaussian on the T&T dataset. It seems to happen in backpropagation. Here is the input of the backward function.And the error disappeared when I commented out https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/backward.cu#L503. I have no idea about why this line would cause illegal memory access.