Open SZH1230456 opened 4 years ago
I think the code in the agent training process https://github.com/keon/policy-gradient/blob/b83f050b70fd0af3358a0f2748743d30c0e7462f/pg.py#L56 has some errors. The calculation process can not get right result. You can figure it out and refer: https://github.com/gabrielgarza/openai-gym-policy-gradient/blob/master/policy_gradient_layers.py#
I think the code in the agent training process https://github.com/keon/policy-gradient/blob/b83f050b70fd0af3358a0f2748743d30c0e7462f/pg.py#L56 has some errors. The calculation process can not get right result. You can figure it out and refer: https://github.com/gabrielgarza/openai-gym-policy-gradient/blob/master/policy_gradient_layers.py#