There is no runs found (tensorboard)

IDayday commented 3 years ago

[ ] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [ ] new feature request
[ ] I have visited the source website
[1] I have searched through the issue tracker for duplicates
[ ] I have mentioned version numbers, operating system and environment, where applicable:
```
import tianshou, torch, numpy, sys
print(tianshou.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
```
I follow the teaching of DQN in Docs, the code runs well but the logger couldn't open. I check the folder "./log/dqn" and some events.out named as "events.out.tfevents.1627916187.DESKTOP-NLRRDA0.49756.0" and it just 40Byte. When I opened the tensorboard to view the logger, it said " There are not any runs in the log folder."

BTW, I want to know how to change the "stop_fn". In the Docs,

stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold

I want to change the reward_threshold, but I can't find the params, so I just use

stop_fn=lambda mean_rewards: mean_rewards >= 500

it seems work ( run more epochs ). But the feedback seems to be faulty. One of them is says

Epoch #10: 10001it [00:06, 1518.47it/s, env_step=100000, len=200, loss=0.204, n/ep=0, n/st=16, rew=200.00] Epoch #10: test_reward: 199.020000 ± 3.078896, best_reward: 200.000000 ± 0.000000 in #8

The reward can't be more than 200.

Trinkle23897 commented 3 years ago

I want to change the reward_threshold, but I can't find the params, so I just use

You can just avoid passing stop_fn into trainer, or set stop_fn=None

The reward can't be more than 200.

This is true, but if one of resulted sequence is [0, 200, 200, 200, ..., 200] with length=100, the mean and std are:

In [1]: import numpy as np

In [2]: a=np.array([0] + [200] * 99)

In [3]: a.mean(), a.std()
Out[3]: (198.0, 19.8997487421324)

BTW, the maximum of CartPole-v0 reward is 200, CartPole-v1's is 500. This is because gym's Timelimit wrapper: https://github.com/openai/gym/blob/334491803859eaa5a845f5f5def5b14c108fd3a9/gym/envs/__init__.py#L56 and each step the reward is always 1.

Syzygianinfern0 commented 3 years ago

For the Tensorboard issue, please try running this code and see if tensorboard results are logged: https://github.com/thu-ml/tianshou/blob/master/test/discrete/test_dqn.py
For solving the reward can't be more than 200 https://github.com/openai/gym/issues/463

thu-ml / tianshou

There is no runs found (tensorboard) #405