mbaske / angry-ai

Battle Robots Demo made with Unity Machine Learning Agents
MIT License
126 stars 24 forks source link

Help with other robot models #4

Open dexfrost89 opened 3 years ago

dexfrost89 commented 3 years ago

Hi! I’ve found your angry-ai repository and it’s amazing! But I ran into several problems when I tried to apply the training to my robot models. Could you help me to fix it? My contacts: dexfrost89@gmail.com

mbaske commented 3 years ago

Hi - are you getting any errors using my project files?

dexfrost89 commented 3 years ago

If I got it correct: first I record demo file with an oscillator, second - behavior cloning for 10M-15M steps with extrinsic reward signal 0.1, GAIL reward signal 1.0 and behavior cloning reward signal 0.5 and third is just learning for 100M steps with extrinsic reward signal 1.0 and all others 0.0. I managed to set up the oscillator for my model and changed the height parameter of the body. But after the behavior cloning part the model doesn’t learn to move like the oscillator and it doesn’t get better after learning part. it shifts from foot to foot and hardly moves in the right direction. it looks like I missed some of your training pipeline. No idea what's wrong. Trying to reproduce results with your models and data.

dexfrost89 commented 3 years ago

I got this behavior after 100M steps on your data https://drive.google.com/file/d/1VmbZDDP_3KzUhMaQe0Sa6N4kzfjFPcp7/view?usp=sharing

mbaske commented 3 years ago

I saw similar behavior like in your video when I was training my models. The issue seems to be that the initial imitation phase has to run for long enough, so that the agent mimicks the demo. However, if it lasts for too long, the model can overfit and struggle to learn direction changes later on. I can't give you an optimal training duration unfortunately - you can try saving a couple of intermediate models during imitation, and then do the second phase for each of them and compare the results. Floor friction is also a bit tricky: for recording the demos, I had to set a medium friction value, otherwise the oscillator driven motion wouldn't work and the bot fell over. For training though, a higher value seemed to work better, giving the bot more grip and avoid sliding. So I increased the friction slightly between training phases.

dexfrost89 commented 3 years ago

I tried several lengths for behavior cloning. Unfortunately, this behavior is the best what I got after 40M steps of learning. https://drive.google.com/file/d/1unkTrmxDzP9MTFeIVh6fltZ0FL_HDYlh/view?usp=sharing Is there any way to contact you somewhere except GitHub for faster communication?