Beron2022 stochasticity

Unfortunately, I found a bug in this code, and now the models have some histories that lead to like 100% switching. Across all histories, we get a full range of 0% switching to 100% switching. If you use ε-greedy, or a softmax policy, basically all you're doing is squishing this range to something centered around 50%. For example, if ε=0 or τ≈0 you get a range from 0-100%, but for higher ε or τ you get a range like 45-55%, approaching all being 50%.

So it's not possible to simply add stochasticity to the policy and match what the animals have, which is a range from 0-50%. The only obvious way I see to do it is to have a bias to repeating the last action baked in, as they did in Beron2022.

So my current thinking is that any "stickiness" (see #8) we see in the RQN is due to model mismatch when doing the logistic regression. In the animals, the weight on recent choices is even higher (relative to other weights) than in the RQN, in line with this "switching probability given history" analysis. So my guess is that the animal's stickiness will not emerge from the RQN, because the RQN essentially learns beliefs.

mobeets / q-rnn

Beron2022 stochasticity #7

ε=0 (greedy):

ε=0.1:

ε=0.2:

ε=1 (i.e., choose action completely randomly each trial)

Belief r-squared (ε=0, 0.1, 0.2, 1)