Open lhorus opened 6 years ago
In principle, from what I understand, acktr should be compatible. Current implementation will likely not work though, but I don't think it should be that hard to fix. If you are willing to help - for instance, by providing a unit test that trains on env with MultiDiscrete action space - we should be able to make it work relatively fast.
Will definitely try, though I am running the code via Colab since it doesn't work properly (SubProcEnv doesn't work on Windows) on Win, so it might take a while.
On a side note, could MultiDiscrete not be represented by a Box( low=[min1, min2], high = [max1, max2], dtype= np.uint8), thus achieving the same result?
Should be implemented in this PR: https://github.com/openai/baselines/pull/677 Substituting MultiDiscrete with Box - no, it won't give you quite the same result, because Box action spaces (even with dtype=np.unit8) assume certain continuity of actions (roughtly speaking, action [0,0] and [0,1] are close, and [0,0] and [10,10] are far). In practice, that would mean that a gaussian distribution over action spaces is used, instead of multi-categorical one.
Is ACKTR compatible with MultiDiscrete type spaces?