rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.84k stars 309 forks source link

Inconsistent Usage between the Definition and Call of the Function _sample_params in cem.py #2204

Open ghost opened 3 years ago

ghost commented 3 years ago

In the source code of cem.py, there is an inconsistent setting between the Definition and Call of the function _sample_params.

The detail is presented below:

In Line 166, the call of the function _sample_params should be set to self._cur_params = self._sample_params(epoch) rather than self._cur_params = self._sample_params(itr) according to its definition, where the only input argument is epoch rather than itr.

ryanjulian commented 3 years ago

@os-popt Thanks for the bug report! If I'm not mistaken, this flaw doesn't stop CEM from working, just that the naming is confusing (epoch vs itr).

Is that correct?

Do you mind sending a pull request with a correction?

ghost commented 3 years ago

Thanks very much for your timely reply. According to your suggestion, I have sended a pull request about this bug.

Although this flaw doesn't stop CEM from working, it may make CEM easier to permaturely converge owing to the incorrect using the function _sample_params() (which will stop to delay std updating of sampling distribution much faster when iter is used as its input argument).

Note that in each epoch, multiple iter (i.e., samples/individuals) can be obtained. For each epoch, these samples/individuals should be sampled from the same multivariate Gaussian distirbution. Only after the end of each epoch, the parameters (mean + std) of the sampling distribution will be updated via MLE.

However, the currently incorrect implementation will update the parameters (mean + std) of the sampling distribution each iter (which should be each epoch).

Note that epoch has been well defined in Line 142 of cmpy.py, as presented below.

epoch = itr // self._n_samples