Closed jeferal closed 3 years ago
You do not need to do a full custom policy class, you can feed in the architecture as part of the algorithm. See the first example here.
PS: TRPO is not in stable-baselines3, but we highly recommend moving to it, as it is more actively supported.
Hello,
I am training a TRPO with MlpPolicy on a custom environment, but now I would like to implement a custom policy. However, I can not find any examples for this agent with a custom policy. So far I have tried this policy:
The problem is that when I see the model graph on TensorBoard, some of the layers appear to be disconnected. Can anyone provide me with an example of how to do this or tell me what I am doing wrong?
My intention is to create a similar policy to MlpPolicy, but modifying the number of layers and neurons.
I truly appreciate your help and time. I also apologize in case I have not followed well the documentation.