mobeets / q-rnn

0 stars 0 forks source link

Beron2022 stochasticity #7

Closed mobeets closed 1 year ago

mobeets commented 1 year ago

Taking the same trained model (H=3) and assessing behavior and belief-rsquared using different values of ε in the policy. Note that for belief-rsquared, this defines the behavioral policy used to test the untrained RQN as well, which explains why those rsquareds increase so much with ε (see #5)

ε=0 (greedy):

image image

ε=0.1:

image image

ε=0.2:

image image

ε=1 (i.e., choose action completely randomly each trial)

image image

Belief r-squared (ε=0, 0.1, 0.2, 1)

mobeets commented 1 year ago

Unfortunately, I found a bug in this code, and now the models have some histories that lead to like 100% switching. Across all histories, we get a full range of 0% switching to 100% switching. If you use ε-greedy, or a softmax policy, basically all you're doing is squishing this range to something centered around 50%. For example, if ε=0 or τ≈0 you get a range from 0-100%, but for higher ε or τ you get a range like 45-55%, approaching all being 50%.

So it's not possible to simply add stochasticity to the policy and match what the animals have, which is a range from 0-50%. The only obvious way I see to do it is to have a bias to repeating the last action baked in, as they did in Beron2022.

So my current thinking is that any "stickiness" (see #8) we see in the RQN is due to model mismatch when doing the logistic regression. In the animals, the weight on recent choices is even higher (relative to other weights) than in the RQN, in line with this "switching probability given history" analysis. So my guess is that the animal's stickiness will not emerge from the RQN, because the RQN essentially learns beliefs.