Open Error323 opened 5 years ago
From https://openreview.net/pdf?id=HkxaFoC9KQ (p7, p16)
The input:
The output:
Please provide feedback!
I think effects should be on inputs(f.e storms, colossi and lurker attacks)
available_actions
input, though I've read somewhere this is actually ignored during imitation learning stage because humans spam wrong actions all the timescreen
, screen2
, minimap
)I have a general idea on how to do it as autoregressive policy, but still quite fuzzy on the embedded approach.
Thanks @inoryy that's useful. I found this article on embeddings, https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526 that seems to explain it well. Its a way to reduce dimensionality of categories into a smaller continuous space. Reminds me of PCA.
There's a bit of terminology clash here, the embedded policy vector is unrelated to embeddings. Though understanding those is also useful because they're extensively used to process inputs (there's a bunch of categorical spatial features).
Duplicating my thoughts from Discord.
The relevant part of the article is this:
Now that I think about maybe they do mean categorical embeddings. So the pipeline would be action id sample -> embedding from ~1700 levels (number of unique action ids) down to 16 dim -> sample args from those 16 dims. But if that's the case I've never seen it done this way before. Also have to be careful propagating gradients with this setup, might need Gumbel-Softmax trick since action id sampling is part of the computation.
This issue is focused on the 4.7.1 version of SC2 and its corresponding replay data. Our first objective is to train a CNN encoder -> LSTM -> CNN decoder network from the data that is able to defeat the default A.I. and plays on all maps/races.
In order to achieve this we need to formalize the input and output of the network so that we can start implementing the various components (replay parser, trainingpipeline, agent).