Encountered Nan problem while training a new robot

jstmn / ikflow

Open source implementation to the paper "IKFlow: Generating Diverse Inverse Kinematics Solutions"

https://sites.google.com/view/ikflow/home

Other

51 stars 5 forks source link

Encountered Nan problem while training a new robot #13

Closed ZhengmaoHe closed 3 months ago

ZhengmaoHe commented 3 months ago

Hi, Thank you for your excellent work!

I am trying to use ikflow in my project, and my robot is different from a regular robotic arm. It has three additional floating joints, so it has a total of 9 degrees of freedom.

I used my own script to generate the same sample and pose datasets, with shapes of [N, 9] and [N, 7], and checked the results in the simulation.
I replaced solution_pose_errors with a simple script I implemented myself, but did not implement collision detection (I simply returned False).
I used the following hyperparameters to train the script: --dim_latent_space=15 --nb_nodes=9 --batch_size=256 --learning_rate=0.0005, with all other parameters remaining default.

I often encounter the problem of loss being NaN around the 2nd to 5th epochs now. I saw that you also set a warning of loss being NaN in two places. Do you have any relevant suggestions?

ZhengmaoHe commented 3 months ago

This is the loss curve

jstmn commented 3 months ago

Hi!

I'd first change nb_nodes to 12 - this will give you greater model capacity which improves performance and can also make training more stable (but also take longer, fyi).

Next, try reducing the learning rate. You can try 3.75*1e-4, 2.5*1e-4, 1.25*1e-4, 1e-5

Let me know how that goes!

jeremy

ZhengmaoHe commented 3 months ago

Thank you, Jeremy! My training results are very good, you have been a big help to me! btw, your code comments are very cute, making my coding less tedious :)

jstmn commented 3 months ago

Glad to hear it! What are you using IKFlow for if you don't mind me asking?

ZhengmaoHe commented 3 months ago

I use it for a loco-manipulation project, and when I complete it better, I will share more details with you!

jstmn commented 3 months ago

Nice! sounds exciting.

If I could ask one more follow up - to do training have you created a new Robot subclass in the jrl package? Or have you changed the code around so you just use a presaved dataset of configuration/EE pose pairs

if it's the former (a new Robot subclass), would you consider adding it to the jrl package?

ZhengmaoHe commented 3 months ago

I am very sorry for my delayed reply. I previously implemented a class based on pytorch_kinematic for my project, which can calculate the forward and inverse kinematics of floating base robots. It is simply a transformation on the solution of a fixed base robot. Then, in order to adapt to ikflow, I also implemented methods such as save_dataset_to_disk and solution_pose_errors etc, and used my custom class to replace the Robot in the code.

So I'm sorry, I didn't make any valuable contribution to jrl package. It is a great tool that implements more features than pytorch_kinematic, thank you for your efforts!

jstmn commented 3 months ago

Got it, makes sense. Thanks, this helps me understand how others are using the code.