Closed nathanhjay closed 5 years ago
Hello,
as mentioned in the documentation, you should be using td3.policies
, and a continuous action environment like Pendulum-v0
, not cartpole, because TD3 only support continuous actions.
the following code works:
import gym
from stable_baselines.td3.policies import FeedForwardPolicy
from stable_baselines import TD3
class MyMlpPolicy(FeedForwardPolicy):
def __init__(self, sess, ob_space, ac_space, n_env=1, n_steps=1, n_batch=None, reuse=False, **_kwargs):
super(MyMlpPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, feature_extraction="mlp", **_kwargs)
env = gym.make('Pendulum-v0')
model = TD3(MyMlpPolicy, env)
Works now, thanks for the help.
Agents could check if action/observation spaces are one of the supported type and throw a bit more informative exception. A quick PR for later time :)
there is already a check for that ;) (at least for td3)
Ah my bad, I did not notice the issue was with using wrong policies. My bad! Perhaps a check for that, but it is already well-documented with proper highlights ^^
Description When I try to instantiate a TD3 model, I get an error in the init function on line 136:
It's possible I misunderstand the variable naming going on, but won't the following lines from td3.py init() always cause a naming conflict because of the
reuse=False
in the scoping?Code Example You can replicate the issue with the following code:
System Info