How can I use the A2C in tf2 branch with code written

oximi123 commented 3 years ago

I am trying to use debugging to figure out the process of A2C method in tf2 branch so I add the code in a2c.py as follows: if __name__ == '__main__': import gym learn('mlp',gym.vector.make('PongNoFrameskip-v4',4)) When I run it in Pycharm, I get such a exception:

or you can see the description like this:

Traceback (most recent call last): File "G:/Deep RL/baselines-tf2/baselines/a2c/a2c.py", line 205, in learn('mlp',gym.vector.make('PongNoFrameskip-v4',4)) File "G:/Deep RL/baselines-tf2/baselines/a2c/a2c.py", line 156, in learn max_grad_norm=max_grad_norm, lr=lr, alpha=alpha, epsilon=epsilon, total_timesteps=total_timesteps) File "G:/Deep RL/baselines-tf2/baselines/a2c/a2c.py", line 35, in init self.train_model = PolicyWithValue(ac_space, policy_network, value_network=None, estimate_q=False) File "G:\Deep RL\baselines-tf2\baselines\common\policies.py", line 33, in init self.pdtype = make_pdtype(policy_network.output_shape, ac_space, init_scale=0.01) File "G:\Deep RL\baselines-tf2\baselines\common\distributions.py", line 180, in make_pdtype raise ValueError('No implementation for {}'.format(ac_space)) ValueError: No implementation for Tuple(Discrete(6), Discrete(6), Discrete(6), Discrete(6))

In fact, i do not understand why this algorithm is implemented in a distributed way and when I try other games, I get the same exception. Can any one explain what is the purpose of this distributed implementation and how can I solve my problem?

THSWind commented 3 years ago

I'm facing the same problem

DanielTakeshi commented 3 years ago

This code base is not supported. It would be best to try out stable-baselines or a similar code base if you are running into errors.

openai / baselines

How can I use the A2C in tf2 branch with code written #1153