Closed sheetalsh456 closed 5 years ago
Docs say ACKTR does not support multi-discrete action spaces, so this could be a result of something inside ACKTR not playing nice with such action spaces.
I assume the Rank mismatch
error happened in distributions.py
around line 299 in the softmax_cross_entropy_with_logits_v2
. Perhaps your actions were something the code did not expect? I would try caveman-debugging that part by printing whatever values are going inside tensorflow around those parks.
Quick question: is the bug only related to ACKTR? did you try with PPO2 for instance? (I would expect so looking at the title of the issue)
Hi,
So yes, I just checked the docs, and seems like ACKTR doesn't support it. But A2C, PPO1 and PPO2 support multi-discrete action spaces. So I tried the same code above with these 3 algorithms (A2C, PPO1 and PPO2) and they gave the same error.
Also, I'm not conceptually sure of what the ideal shapes of pi_latent, vf_latent and value_fn should be when I use a multi-discrete action space of MultiDiscrete([100]*30)? I'd really like to understand their ideal shapes in this case.
Could you give us the complete traceback?
Okay turns out the same code works for A2C and PPO. Probably it's just not supported by ACKTR at the moment. Thanks a lot! :)
Hi,
I was trying to run ACKTR with a custom policy and an environment which uses a Box observation space and a Multi-discrete action space.
Here is my environment init():
And my custom policy is:
The shape of pi_latent is (batch_size, 128), the shape of vf_latent is (batch_size, 32) and value_fn is (batch_size, 1), and I get the following error:
ValueError: Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2).
I've tried making vf_latent & pi_latent as 3d tensors, but then I get the error:
ValueError: Shape must be rank 2 but is rank 3 for 'model/pi/MatMul' (op: 'MatMul') with input shapes: [?,30,128], [30,3000].
I also tried making value_fn a (batch_size, 30) tensor, but that gives me the rank mismatch error again.
The same code above works for a discrete action space, but I'm not sure what changes to make in my custom policy for a multi-discrete action space. Can someone please help me out?
Any help/suggestions will be greatly appreciated!
Thanks!