invalid interpolation, fails `t0 <= t <= t1`: 0.0, nan, 0.0

stevenygd / PointFlow

PointFlow : 3D Point Cloud Generation with Continuous Normalizing Flows

https://www.guandaoyang.com/PointFlow/

MIT License

720 stars 101 forks source link

invalid interpolation, fails `t0 <= t <= t1`: 0.0, nan, 0.0 #4

Closed densechen closed 4 years ago

densechen commented 4 years ago

This is really a very interesting work, and thank you for providing so nice code! However, when a run the code, I will get such error sometimes, but don't know what caused this problem. Could you please provide same ideas?

stevenygd commented 4 years ago

Hi, usually when I run into this problem, it's caused by an unrestricted gradient field (i.e. the ODEfunc you are intergrating over). Trying to make the ODEFunc smoother (i.e. using smoothier nonlinearily).

Just to double-check, did you run into this problem without modifying any of the source code? If you did the modification, what kind of modification did you do to the code?

densechen commented 4 years ago

Hi, thanks for your reply. I have tried many types of nonlinear functions, such as tanh, relu, sigmoid. ALL FAILED. And I found that while back step, the assert failed in dopri5.py -> Dopri5Solver -> _adptive_dopri5_step because of "underflow in dt nan". I have check the code many times, and make almost all the training settings the same as you, expect the training data. Could you please give me same help about this? Best Chen

stevenygd commented 4 years ago

One thing that might be helpful is to add a constraint nonlinearity at the end of the ODEfunc. For example MLP->NL->MLP->Tanh.

densechen commented 4 years ago

Thanks for your reply! And this do really help!

Fly-Pluche commented 8 months ago

Hello, I added Pointflow into my model, the loss was too huge to 1.5e+5, and the nan happened after I fixed the code for forgetting to do the normalization of the point cloud. I'm experiencing the same problem as @densechen. I tried adding NL(ReLU), but it didn't work. Can you please provide some advice? Here are some images that may help you understand the issue:
Upon looking at your code, I noticed that you already have an NL between the MLP. Therefore, I tried adding ReLU in other places, even though it seemed strange. Next image that shows what I did: Bug: