Closed kosmylo closed 4 years ago
Any idea of what is happening?
This is called moving mean (it uses a moving window to compute the mean) to reduce the noise in the episodic reward. https://stable-baselines.readthedocs.io/en/master/misc/results_plotter.html#stable_baselines.results_plotter.rolling_window
Anyway, as mentioned in the doc, I would recommend using the rl zoo and EvalCallback
to monitor the true performance and not the one during training.
I can understand that this is the moving mean, but why it does not compute the mean for the first few timesteps? If you check in the image, it does not plot the mean (blue line) from the beginning.
why it does not compute the mean for the first few timesteps?
How do you compute the mean of 2 elements using a moving window of size 10? You could say that this would be the mean of those 2 elements but this is not really satisfying (and also we have an implementation that uses 1D conv https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/results_plotter.py#L30 and thus does not work like that). The mean is only defined when the number of timesteps is above the window size, so it does not start a t=0.
I use results_plotter to plot the episode reward at the end of the training. Recently for no reason, the plot does not show the mean reward from the beginning of the episode, but after a certain timestep as follows:
I use the following script for training:
Any idea of what is happening?