Closed SoMuchSerenity closed 2 years ago
I don't quite understand this definition as I would consider the output dimension to be 2*action_shape if using Gaussian policy.
actor = ActorProb(net, env.action_space.n, device=device).to(device)
See the example of test/continuous/test_ppo.py
.
Thanks Weng. I closed the issue because I have realised the flexibility of Tianshou as I was using other RL library and it was very rigid and not flexible. I will look into the documentation and source code further. Will come back to you if I have more questions. Thanks very much for your reponse! I have also tried envpool as it is recommended in the documentation, however it is not supported at the moment on Windows. Great job on these libraries!
0.4.9 0.25.0 1.12.0 1.22.3 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] win32
Hi,
I am working on an environment which returns a dictionary observation space, consisting of an image and some scalar variables. I have looked through all the issues and documentation, yet couldn't find a related question. PPO is the algorithm I intend to work with.
Generally, my pre-processing network would look like the above, where a CNN deals with image input and a MLP deals with 2 scalar inputs. After this ,
Will be used to create actor and critic. I also have one question regarding Actor definition. I have checked the source code of Actor(), the output dimension is defined as:
I don't quite understand this definition as I would consider the output dimension to be 2*action_shape if using Gaussian policy.
Thanks in advance with the help!