Difficulties when Training RL Policies

transic-robot / transic

MIT License

60 stars 2 forks source link

Difficulties when Training RL Policies #5

Open PeideHuang opened 3 months ago

PeideHuang commented 3 months ago

I tried to run the RL training scripts for multiple tasks such as Stabilize, Reach and Grasp, and Insert by python3 main/rl/train.py task=<task_name> sim_device=cuda:<gpu_id> rl_device=cuda:<gpu_id> graphics_device_id=<gpu_id> However, none of the RL agents successfully learn to complete the tasks even after long time (an example for ReachAndGraspSingle shown below). I used the num_envs in the default task config file. Are there any hyperparameters I need to tune? Screenshot from 2024-07-22 13-12-59

yunfanjiang commented 3 months ago

For most tasks we also added task-specific curricula (such as warm up over object geometries and action space), reward functions provided in the code correspond to those in the last curriculum stages. But for the task Stabilize there should be some learning signal without the curriculum. Could you provide some learning curves for that task so I can take a look?

user432 commented 1 month ago

Hello @yunfanjiang, thanks for open-sourcing your code. I would also like to ask the same question that if we should be tuning any parameters before running the code. I ran the InsertSingle task as it is and here are the plots: