inoryy / reaver

Reaver: Modular Deep Reinforcement Learning Framework. Focused on StarCraft II. Supports Gym, Atari, and MuJoCo.
MIT License
554 stars 90 forks source link

Implementaion of transfomer into model #31

Open kimbring2 opened 5 years ago

kimbring2 commented 5 years ago

Hello, thank you for sharing good code.

I am trying to solve a DefeatRoaches minigame by using a Relational Network.

I found a example code of Transformer for MNIST classification and modified a fully_conv.py file for it. Unlike original code, I only use a screen feature without a minimap feature. But, result is still not good.

Would you like give me recommendation how to modify it for reaching performance of DeepMind?

Thank you. From Dohyeong

inoryy commented 5 years ago

Hello Dohyeong,

I assume you're trying to replicate the AlphaStar architecture?

First, note that spatial information is still processed by a normal conv model and transformer body is applied only to a flattened list of unit information (see architecture figure below).

Second, the transformer would require a full view of the game map as opposed to only the camera viewport we currently get from PySC2. There is some WIP patch to add support for it, but no ETA on when it will be released.

Third, from your results figures I see that mean rewards are stuck around 14 which as far as I remember is about the score you would get from random actions. I'm afraid this implies your model hasn't learned much beyond the initial jump, so improving from there might be quite difficult.

What I would recommend is to try to simplify the model as much as possible - reduce state / action spaces to the bare minimum, remove non-spatial and minimap info blocks. After verifying that it works try to add transformer body as a separate layer only on player_relative feature and merge it with the final spatial state block before the final policy / value layers.