purdue-biorobotics / flappy

An open source dynamic simulation for flapping wing robots and animals
MIT License
217 stars 62 forks source link

Colab Notebook #1

Open araffin opened 5 years ago

araffin commented 5 years ago

Hello,

I set up a colab notebook, so you can train your agents online on flappy envs ;) : https://colab.research.google.com/drive/13mJ1bU2tKVurG9chNhM0U7ivgVKlzPu7

Also, I have some questions about the training:

It seems, that your maneuver does not follow gym interface, the reward must be a float, but is currently a numpy array (I had to use a reward wrapper to catch up the error).

I would also normalize the reward using the opposite of the cost instead of the inverse (otherwise the reward magnitude is really huge), and maybe add a "life bonus" (+1 for each timesteps) for the hover env, see here for an example ;)

ffnc1020 commented 5 years ago

Hi, Thank you for your interest and sorry about the delayed reply. The notebook is a great idea! You can make a pull request and add that in the markdown or however you see fit.

The hovering control of the flapping wing robot is still an open problem, so I just have a feedback controller for the demo, which is already not easy to achieve. The system is extremely unstable so it is very difficult to control.

The maneuvering is trained for 5 million steps, using default hyper parameters with reward scaling of 0.05. Yes the inverse creates a huge reward at the target position and pose which helps attract the robot to converge better.

I'll fix the reward to be an array.

I'll post the training performance and some demo clips in the next update.

juanmed commented 5 years ago

@araffin Hi, thanks for setting the colab notebook. As I explain in #2, pydart2 is deprecated for the latest dartsim version v6.9. To run successfully one needs to install dartsim<=6.8.2 from source. This is indicated in my last comment in #2 . Could it be possible that you update the notebook to reflect this change and be able to run it successfully? Thank you.

araffin commented 5 years ago

Could it be possible that you update the notebook to reflect this change and be able to run it successfully?

Well, you can copy and update the notebook yourself (and post the link here afterward ;) ). I don't have the time to do that now.

SaltedfishLZX commented 4 years ago

it seems that the error of the reward type still exist now, the type is np.ndarray most of the time. BTW, the maneuver env seems to train the model to fix the out of ARC controller. I'm wondering that if there is a successful example of training without a feedback controller, or it's just too difficult to do such a control?