openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.75k stars 4.87k forks source link

ACKTR MultiDiscrete #673

Open lhorus opened 6 years ago

lhorus commented 6 years ago

Is ACKTR compatible with MultiDiscrete type spaces?

pzhokhov commented 6 years ago

In principle, from what I understand, acktr should be compatible. Current implementation will likely not work though, but I don't think it should be that hard to fix. If you are willing to help - for instance, by providing a unit test that trains on env with MultiDiscrete action space - we should be able to make it work relatively fast.

lhorus commented 6 years ago

Will definitely try, though I am running the code via Colab since it doesn't work properly (SubProcEnv doesn't work on Windows) on Win, so it might take a while.

On a side note, could MultiDiscrete not be represented by a Box( low=[min1, min2], high = [max1, max2], dtype= np.uint8), thus achieving the same result?

pzhokhov commented 6 years ago

Should be implemented in this PR: https://github.com/openai/baselines/pull/677 Substituting MultiDiscrete with Box - no, it won't give you quite the same result, because Box action spaces (even with dtype=np.unit8) assume certain continuity of actions (roughtly speaking, action [0,0] and [0,1] are close, and [0,0] and [10,10] are far). In practice, that would mean that a gaussian distribution over action spaces is used, instead of multi-categorical one.