Sample size (i.e., number of seeds) is hard-coded to ten. This is problematic for any evaluation that is not ten seeds because the resulting confidence intervals will be incorrect; and
The critical value for the 95% confidence interval is hard-coded to use the normal distribution (1.96), which is incorrect for small sample sizes like $n=10$. It is generally recommended to use the t-distribution for $n\le30$ and an unknown population standard deviation (which is the case in RL, since we are estimating the standard deviation from our sample). For example, the two-sided 95% critical value for $n=10$ using Student's t-distribution is 2.262, not 1.96. Student's t-distribution will make the confidence intervals more conservative for small sample sizes.
In this line: