Open vzhuang opened 4 years ago
Are you 100% sure this is a bug?
If the episodes are of fixed length (they are) then you can compute t_k from just k as (k * episode_length).
My belief is this is what is happening?
Right, it's a simple fix. Since the time is inside a log factor, this can't be "fixed" by adjusting the scaling constant. I'm guessing it probably has at least a small impact on your results depending on if you tune the scaling factor.
As per Jaksch et. al 2010, the confidence intervals for UCRL2 use t_k := the timestep at the start of episode k. However, in
run_finite_tabular_experiment
inexperiment.py
, the episode index is wrongly passed instead of the timestep.UCFH is also affected by this bug.