I am trying to use debugging to figure out the process of A2C method in tf2 branch
so I add the code in a2c.py as follows:
if __name__ == '__main__': import gym learn('mlp',gym.vector.make('PongNoFrameskip-v4',4))
When I run it in Pycharm, I get such a exception:
or you can see the description like this:
Traceback (most recent call last):
File "G:/Deep RL/baselines-tf2/baselines/a2c/a2c.py", line 205, in
learn('mlp',gym.vector.make('PongNoFrameskip-v4',4))
File "G:/Deep RL/baselines-tf2/baselines/a2c/a2c.py", line 156, in learn
max_grad_norm=max_grad_norm, lr=lr, alpha=alpha, epsilon=epsilon, total_timesteps=total_timesteps)
File "G:/Deep RL/baselines-tf2/baselines/a2c/a2c.py", line 35, in init
self.train_model = PolicyWithValue(ac_space, policy_network, value_network=None, estimate_q=False)
File "G:\Deep RL\baselines-tf2\baselines\common\policies.py", line 33, in init
self.pdtype = make_pdtype(policy_network.output_shape, ac_space, init_scale=0.01)
File "G:\Deep RL\baselines-tf2\baselines\common\distributions.py", line 180, in make_pdtype
raise ValueError('No implementation for {}'.format(ac_space))
ValueError: No implementation for Tuple(Discrete(6), Discrete(6), Discrete(6), Discrete(6))
In fact, i do not understand why this algorithm is implemented in a distributed way and when I try other games, I get the same exception.
Can any one explain what is the purpose of this distributed implementation and how can I solve my problem?
I am trying to use debugging to figure out the process of A2C method in tf2 branch so I add the code in a2c.py as follows:![image](https://user-images.githubusercontent.com/43880771/102493355-022e2d80-40ae-11eb-8124-ef2a97e308d1.png)
if __name__ == '__main__': import gym learn('mlp',gym.vector.make('PongNoFrameskip-v4',4))
When I run it in Pycharm, I get such a exception:or you can see the description like this:
Traceback (most recent call last): File "G:/Deep RL/baselines-tf2/baselines/a2c/a2c.py", line 205, in
learn('mlp',gym.vector.make('PongNoFrameskip-v4',4))
File "G:/Deep RL/baselines-tf2/baselines/a2c/a2c.py", line 156, in learn
max_grad_norm=max_grad_norm, lr=lr, alpha=alpha, epsilon=epsilon, total_timesteps=total_timesteps)
File "G:/Deep RL/baselines-tf2/baselines/a2c/a2c.py", line 35, in init
self.train_model = PolicyWithValue(ac_space, policy_network, value_network=None, estimate_q=False)
File "G:\Deep RL\baselines-tf2\baselines\common\policies.py", line 33, in init
self.pdtype = make_pdtype(policy_network.output_shape, ac_space, init_scale=0.01)
File "G:\Deep RL\baselines-tf2\baselines\common\distributions.py", line 180, in make_pdtype
raise ValueError('No implementation for {}'.format(ac_space))
ValueError: No implementation for Tuple(Discrete(6), Discrete(6), Discrete(6), Discrete(6))
In fact, i do not understand why this algorithm is implemented in a distributed way and when I try other games, I get the same exception. Can any one explain what is the purpose of this distributed implementation and how can I solve my problem?