mbaske / angry-ai

Battle Robots Demo made with Unity Machine Learning Agents
MIT License
126 stars 24 forks source link

Angry AI - Video

This is a little robot battle simulation, made with Unity Machine Learning Agents.

Each bot is controlled by two reinforcement learning agents which were trained consecutively with PPO.

For the lower-tier agent ("walker"), I first created demonstration files, recording heuristic actions generated by an oscillator. The agent was then trained to imitate those actions, using a GAIL reward signal with its strength set to 1.0 and the use_actions option enabled. Behavioural cloning was added with a strength of 0.5. The extrinsic reward signal's strength was set to 0.1 which proved to be sufficient for learning how to recover from random start rotations (not included in demonstrations). This first training phase should run for somewhere between 10 to 15 million steps - enough for the agent to mimick the oscillator motion, but not too long so as to prevent the policy from overfitting. During the second training phase, I randomized the target speeds. The GAIL and behavioural cloning signals were now removed and the extrinsic reward's strength set to 1.0. In the final third phase, the walk and look directions were randomized as well in order to generalize the policy. I also increased the ground's friction a little between training phases.

The upper-tier agent ("fighter") generates the target speeds and walk/look directions for the walker. It observes the bot's vicinity using a grid sensor. In an initial round of training, the fighter's output actions were fed to a dummy agent, standing in for the walker and roughly emulating its behaviour. I did this to cut down training time, since the dummies don't require a neural net. Once the fighter policy showed enough training progress, I replaced the dummies with walker agents running in inference mode. I then continued training the fighter model, fine-tuning it under more realistic conditions.

The project contains a few freely available assets: