Closed IDayday closed 3 years ago
I want to change the reward_threshold, but I can't find the params, so I just use
You can just avoid passing stop_fn into trainer, or set stop_fn=None
The reward can't be more than 200.
This is true, but if one of resulted sequence is [0, 200, 200, 200, ..., 200] with length=100, the mean and std are:
In [1]: import numpy as np
In [2]: a=np.array([0] + [200] * 99)
In [3]: a.mean(), a.std()
Out[3]: (198.0, 19.8997487421324)
BTW, the maximum of CartPole-v0 reward is 200, CartPole-v1's is 500. This is because gym's Timelimit wrapper: https://github.com/openai/gym/blob/334491803859eaa5a845f5f5def5b14c108fd3a9/gym/envs/__init__.py#L56 and each step the reward is always 1.
For the Tensorboard issue, please try running this code and see if tensorboard results are logged: https://github.com/thu-ml/tianshou/blob/master/test/discrete/test_dqn.py
For solving the reward can't be more than 200 https://github.com/openai/gym/issues/463
I follow the teaching of DQN in Docs, the code runs well but the logger couldn't open. I check the folder "./log/dqn" and some events.out named as "events.out.tfevents.1627916187.DESKTOP-NLRRDA0.49756.0" and it just 40Byte. When I opened the tensorboard to view the logger, it said " There are not any runs in the log folder."
BTW, I want to know how to change the "stop_fn". In the Docs,
I want to change the reward_threshold, but I can't find the params, so I just use
it seems work ( run more epochs ). But the feedback seems to be faulty. One of them is says
The reward can't be more than 200.