hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

[question] How to implement custom policy for TRPO #1119

Closed jeferal closed 3 years ago

jeferal commented 3 years ago

Hello,

I am training a TRPO with MlpPolicy on a custom environment, but now I would like to implement a custom policy. However, I can not find any examples for this agent with a custom policy. So far I have tried this policy:

class CustomPolicyPR(FeedForwardPolicy):
    def __init__(self, *args, **kwargs):
        super(CustomPolicyPR, self).__init__(*args, **kwargs,
                                            net_arch=[128, dict(pi=[256,128,64],
                                                               vf=[256,128,64])],
                                                        feature_extraction="mlp")

The problem is that when I see the model graph on TensorBoard, some of the layers appear to be disconnected. Can anyone provide me with an example of how to do this or tell me what I am doing wrong?

My intention is to create a similar policy to MlpPolicy, but modifying the number of layers and neurons.

I truly appreciate your help and time. I also apologize in case I have not followed well the documentation.

Miffyli commented 3 years ago

You do not need to do a full custom policy class, you can feed in the architecture as part of the algorithm. See the first example here.

PS: TRPO is not in stable-baselines3, but we highly recommend moving to it, as it is more actively supported.