Closed mg64ve closed 3 years ago
Hello, I think you are just printing the error message even though there is no error... If there was one, this would have printed only once and stopped after raising the assert error.
Thanks @araffin , you are right, it keeps going. However it does not print any ep_reward_mean data. Is it learning ? The following is part of the log:
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
--------------------------------------
| approxkl | 0.00026964353 |
| clipfrac | 0.0 |
| explained_variance | 0.019 |
| fps | 215 |
| n_updates | 14 |
| policy_entropy | 2.0163999 |
| policy_loss | -0.0038015784 |
| serial_timesteps | 1792 |
| time_elapsed | 8.49 |
| total_timesteps | 1792 |
| value_loss | 39.603714 |
--------------------------------------
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
-------------------------------------
| approxkl | 0.0018063349 |
| clipfrac | 0.015625 |
| explained_variance | 0.000335 |
| fps | 211 |
| n_updates | 15 |
| policy_entropy | 2.0012653 |
| policy_loss | -0.008501572 |
| serial_timesteps | 1920 |
| time_elapsed | 9.08 |
| total_timesteps | 1920 |
| value_loss | 42.672264 |
-------------------------------------
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 2]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 2]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([1, 0]) (<class 'numpy.ndarray'>) invalid
array([0, 3]) (<class 'numpy.ndarray'>) invalid
array([0, 1]) (<class 'numpy.ndarray'>) invalid
array([1, 1]) (<class 'numpy.ndarray'>) invalid
--------------------------------------
| approxkl | 0.00018022978 |
| clipfrac | 0.0 |
| explained_variance | 0.0575 |
| fps | 218 |
| n_updates | 16 |
| policy_entropy | 1.9997047 |
| policy_loss | -0.0007104698 |
| serial_timesteps | 2048 |
| time_elapsed | 9.69 |
| total_timesteps | 2048 |
| value_loss | 64.8821 |
--------------------------------------
and this is what the original script prints:
--------------------------------------
| approxkl | 0.00018330614 |
| clipfrac | 0.0 |
| ep_len_mean | 21.4 |
| ep_reward_mean | 21.4 |
| explained_variance | 0.00317 |
| fps | 439 |
| n_updates | 1 |
| policy_entropy | 0.69296503 |
| policy_loss | -0.0029364298 |
| serial_timesteps | 128 |
| time_elapsed | 2e-05 |
| total_timesteps | 512 |
| value_loss | 40.458572 |
--------------------------------------
However it does not print any ep_reward_mean data. Is it learning ?
Duplicate of https://github.com/hill-a/stable-baselines/issues/24 and it is also in the documentation, you need a Monitor wrapper for that. But yes, it is (the loss is reported).
Anyway, I would recommend you to use Stable-Baselines3: https://github.com/DLR-RM/stable-baselines3 (the env is wrapped in a monitor wrapper automatically when possible)
closing as the original issue was solved
Thanks @araffin . With SB3 almost the same error but it prints ep_reward_mean:
Again, as arrafin pointed , that "error message" comes from your environment, not from stable-baselines (the line where you print). We do not offer custom tech support for fixing custom environments.
There is no error, just a print.
ok thanks @Miffyli and @araffin . I did not understand that print is because my environment.
Hi, I am running stable-baselines in docker with cuda 10. Of course, the following example from documentation works:
I want to test the same with MultiDiscrete environment. For this reason I wrote CustomCartpole environment that uses 2D action space (it is only a simple example):
I am getting the following error:
I have checked in the documentation and I did not find anything regarding action space for MlpPolicy. I assume it should check the dimensions and adjust the policy accordingly. What is wrong?