In ES, compute_ranks() does an argsort, which will give different ranks to individuals with the same fitness.
This introduces a noise in the gradient estimate. This is not a big issue since the expected value of the noise is zero, but it can slow down convergence.
This is really only a problem in environments, where rewards are sparse, so a lot of individuals will have the same fitness.
Solution: Average ranks for individuals with equal fitness.
In ES, compute_ranks() does an argsort, which will give different ranks to individuals with the same fitness. This introduces a noise in the gradient estimate. This is not a big issue since the expected value of the noise is zero, but it can slow down convergence. This is really only a problem in environments, where rewards are sparse, so a lot of individuals will have the same fitness. Solution: Average ranks for individuals with equal fitness.