iosband / TabulaRL

MIT License
65 stars 32 forks source link

UCRL2/UCFH confidence intervals are incorrect #5

Open vzhuang opened 4 years ago

vzhuang commented 4 years ago

As per Jaksch et. al 2010, the confidence intervals for UCRL2 use t_k := the timestep at the start of episode k. However, in run_finite_tabular_experiment in experiment.py, the episode index is wrongly passed instead of the timestep.

UCFH is also affected by this bug.

iosband commented 4 years ago

Are you 100% sure this is a bug?

If the episodes are of fixed length (they are) then you can compute t_k from just k as (k * episode_length).

My belief is this is what is happening?

vzhuang commented 4 years ago

Right, it's a simple fix. Since the time is inside a log factor, this can't be "fixed" by adjusting the scaling constant. I'm guessing it probably has at least a small impact on your results depending on if you tune the scaling factor.