mobeets / q-rnn

0 stars 0 forks source link

Beron summary #18

Open mobeets opened 7 months ago

mobeets commented 7 months ago

What we know:

Qs:

mobeets commented 7 months ago

Note that above is using the *grant* models, which are trial-level, H=10, γ=0.9 models that seem to be trained pretty well. Also both the Belief and Value RNN models were tested with ε=0.04.

For ε=0, the weights are slightly different. The belief covariates (red/orange) have a stronger relative weight in both models; and for the Belief model, the t=0 weights are more similar

Q: Why does the pure Belief model show positive decoding for the previous choice?

mobeets commented 7 months ago

But the Value RNN is a little closer to the mouse than the Belief model if you allow considering a longer history (ε=0 below):

Again with ε=0.04:

Also, note that the RNN/Belief weights decay slower than the mouse. But I would guess that if we did a timestep-level model, the RNNs would decay faster (since integrating over that many trials would be much harder).

mobeets commented 7 months ago

Q: Why does the pure Belief model sometimes show positive decoding for the previous choice?

A: Note that the decoding model assumes a linear impact of covariates on action choice. But the Belief updates are nonlinear. For example, say b(t-1)=0.5. Then it's true that the covariate (choice==reward) determines b(t)--i.e., b(t) = w * b(t-1) for some fixed w, when b(t-1)=0.5, and c(t-1) == r(t-1). But when b(t-1) ≠ 0.5, this is no longer true. In that case, we need a separate weight for c(t-1) == r(t-1) = 0, vs. c(t-1) == r(t-1) = 1.

So I was essentially combining the wrong things in the plots above. Instead, the belief model would look for symmetric if we had choice==reward and choice!=reward. But maybe even better would just to be splitting things based on A, a, B, b.

Now we see some separation at longer lags (e.g., A > b), and also lots of variability. I think the variability makes sense actually for a pure belief model, because the weight changes based on what your previous belief was.

Repeating the analysis for the RNN (γ=0.2, H=10, trial-level) and Mouse:

mobeets commented 7 months ago

Now for timestep-level RNNs:

The RNNs were trained on 200 trials per episode (as compared to 800 for trial-level), and no abort penalty. Though including an abort penalty returned similar results: