Define Neural Network I/O for 4.7.1 replay data.

polaris-sc2 / polaris-training

For code etc relating to the network training process.

Apache License 2.0

12 stars 2 forks source link

Define Neural Network I/O for 4.7.1 replay data. #9

Open Error323 opened 5 years ago

Error323 commented 5 years ago

This issue is focused on the 4.7.1 version of SC2 and its corresponding replay data. Our first objective is to train a CNN encoder -> LSTM -> CNN decoder network from the data that is able to defeat the default A.I. and plays on all maps/races.

In order to achieve this we need to formalize the input and output of the network so that we can start implementing the various components (replay parser, trainingpipeline, agent).

Error323 commented 5 years ago

From https://openreview.net/pdf?id=HkxaFoC9KQ (p7, p16)

The input:

minimap (all featuremaps defined by pysc2)
screen (all featuremaps defined by pysc2)
general player information (see here)
previous action

The output:

A game result prediction V ∈ [-1, 1]
An action identifier a ∈ A (basically a function name)
A list of non-spatial arguments useful for a subset of A
A 2D map of spatial arguments (x,y) useful for a different subset of A.

Please provide feedback!

Matuiss2 commented 5 years ago

I think effects should be on inputs(f.e storms, colossi and lurker attacks)

inoryy commented 5 years ago

action identifier should be masked based on available_actions input, though I've read somewhere this is actually ignored during imitation learning stage because humans spam wrong actions all the time
there are three separate spatial arguments (screen, screen2, minimap)
how output is defined is very important.
- the most naive way is to ignore any relation between args. that's how I do it in reaver.
- the way described in original SC2LE article is via autoregressive model
- the way described in Relational DRL article is via embedded 16-dim vector

I have a general idea on how to do it as autoregressive policy, but still quite fuzzy on the embedded approach.

Error323 commented 5 years ago

Thanks @inoryy that's useful. I found this article on embeddings, https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526 that seems to explain it well. Its a way to reduce dimensionality of categories into a smaller continuous space. Reminds me of PCA.

inoryy commented 5 years ago

There's a bit of terminology clash here, the embedded policy vector is unrelated to embeddings. Though understanding those is also useful because they're extensively used to process inputs (there's a bunch of categorical spatial features).

inoryy commented 5 years ago

Duplicating my thoughts from Discord.

The relevant part of the article is this: screenshot from 2019-02-07 13-37-26

Now that I think about maybe they do mean categorical embeddings. So the pipeline would be action id sample -> embedding from ~1700 levels (number of unique action ids) down to 16 dim -> sample args from those 16 dims. But if that's the case I've never seen it done this way before. Also have to be careful propagating gradients with this setup, might need Gumbel-Softmax trick since action id sampling is part of the computation.