Closed Yang-Hao-Lin closed 2 years ago
Hi, thank you for your interest in our work! I think it happened when the disparity decoder is not properly trained, and there can be multiple reasons such as training instability, invalid inputs, etc.. I would first test the pre-trained model using with the version 11.2 and check if the scene flow accuracy matches the baseline's.
Probably doing the unit-tests of cuda-dependent modules can be also necessary.
Please try to use the python implementation of the correlation layer (--correlation_cuda_enabled=False
) and also check if the softsplat works well.
If it still doesn't work, please let me know!
Hi Junhwa. Thank a lot for your attention! The server administrator of my lab updated my GPU from Tesla P40 to RTX A40 yesterday. I test in the environment setting of cudatoolkit=11.2 again, and now the output of pts_loss is within a reasonable range. So, now the situation is:
Test on GPU P40, cudatoolkit=10.2 -> pts_loss is within a reasonable range; Test on GPU P40, cudatoolkit=11.2 -> pts_loss is extremely big, either correlation_cuda_enabled equals False or equals True, even on a supervised pretrained model; Test on GPU A40, cudatoolkit=11.2 -> pts_loss is within a reasonable range.
It is weird, but there seems to be some compatibility issue between Tesla P40 and cudatoolkit=11.2.
Thank you for sharing them!
Hi Junhwa, thanks a lot for the awesome work in the field of scene flow :) When l tried to use your loss in my environment setting (cudatoolkit=11.2), the pts_loss(s_3) became extremely big, for example, bigger than 100000. But when I set the version of cudatoolkit as 10.2, the output of pts_loss became normal (usually less than 10). I can not find out the reason. Have you ever met this kind of situation? Again, thanks a lot.