thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.96k stars 1.13k forks source link

There is no runs found (tensorboard) #405

Closed IDayday closed 3 years ago

IDayday commented 3 years ago

BTW, I want to know how to change the "stop_fn". In the Docs,

stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold

I want to change the reward_threshold, but I can't find the params, so I just use

stop_fn=lambda mean_rewards: mean_rewards >= 500

it seems work ( run more epochs ). But the feedback seems to be faulty. One of them is says

Epoch #10: 10001it [00:06, 1518.47it/s, env_step=100000, len=200, loss=0.204, n/ep=0, n/st=16, rew=200.00] Epoch #10: test_reward: 199.020000 ± 3.078896, best_reward: 200.000000 ± 0.000000 in #8

The reward can't be more than 200.

Trinkle23897 commented 3 years ago

I want to change the reward_threshold, but I can't find the params, so I just use

You can just avoid passing stop_fn into trainer, or set stop_fn=None

The reward can't be more than 200.

This is true, but if one of resulted sequence is [0, 200, 200, 200, ..., 200] with length=100, the mean and std are:

In [1]: import numpy as np

In [2]: a=np.array([0] + [200] * 99)

In [3]: a.mean(), a.std()
Out[3]: (198.0, 19.8997487421324)

BTW, the maximum of CartPole-v0 reward is 200, CartPole-v1's is 500. This is because gym's Timelimit wrapper: https://github.com/openai/gym/blob/334491803859eaa5a845f5f5def5b14c108fd3a9/gym/envs/__init__.py#L56 and each step the reward is always 1.

Syzygianinfern0 commented 3 years ago