tomasvr / turtlebot3_drlnav

A ROS2-based framework for TurtleBot3 DRL autonomous navigation
128 stars 18 forks source link

Training with waffle_pi #5

Closed sudo-homebrew closed 5 months ago

sudo-homebrew commented 11 months ago

I wonder if you ever tried training with waffle pi model not a burger. I am trying to train with waffle pi but even if I changed all the specifications on setting.py and util.py files, the training itself is running but seems not to be learning(i.e. rotating in one place, keep crashing on the wall and dynamic obstacles). Even if I modified the reward function refers to many other reward strategies but nothing changes. So I wonder if you have any experience with waffle pi model if so or if you have any idea to solve this problem, I want you to share it with me.

tomasvr commented 10 months ago

I have never trained a full model using the waffle pi, but in theory it should not make a big difference. What type of model are you training? (dqn/ddpg/td3). I would suggest starting with dqn as it is the simplest algorithm. Also use stage 1 and a simple reward function. Also, tweaking the most important hyperparameters such as the learning rate and batch size could make a difference Unfortunately, it is really hard to debug this sort of issue. Sometimes it simply takes a long time before the robot learns to stop rotating.

sudo-homebrew commented 10 months ago

It doesn't seem like it's going to stop rotating even if I train more than 20k episodes at stage 1. It starts to rotate in one place from the beginning till more than 20k episodes even if I only changed the model to waffle pi and some model information such as linear, angular max speed and threshold goal and collision. I tried ddpg and td3 algorithms to train waffle pi with the same reward function as what you offered(for the burger model) but didn't work. I wonder what kind of strategy you used to make the current reward function and hyperparameters for burger model. If you have useful experience and strategies please share with us.

tomasvr commented 7 months ago

The examples I have included here show all of the hyperparameters and other configurations (stage, reward function) I have used for training successful models. You can find most of the information in the _hyperparams_20220808-030634.txt file and in the name of the files. The ddpg example was trained on stage 9 for example.