Beron summary - Githubissues

mobeets commented 7 months ago

What we know:

RNNs trained on the 2ABT task develop beliefs [see pdf]
RNNs can recapitulate the behavioral asymmetry seen in mice, where choice*reward > -choice*omission, though less than mice [see pdf, and below]
RNNs also show perseveration (using choice decoding analysis), though less than mice [see pdf, and below]
RNNs fit on trial-level vs timestep-level show different FPs/dynamics [#11]
Belief models also show asymmetry and perseveration, due to model mismatch in the decoding analysis (since beliefs are a nonlinear update); RNNs (timestep-level) may show slightly more asymmetry and perseveration though—though it's hard to tell

Qs:

can we confirm that RNNs really show some perseveration? (A: Yes, they do)
If you do the decoding analysis on a pure belief model, is there any choice perseveration? or asymmetry in decoding choices? if not, then for the 3rd/4th points above we could add "UNLIKE beliefs". (A: They do once you use ε > 0, so we can't say 'UNLIKE beliefs')
- Do timestep-level RNNs look the same as trial-level RNNs? (A: Yes, basically the same)
Do RNNs make different behavioral or dopamine predictions when fit on trial-level vs timestep-level? (A: TBD)

mobeets commented 7 months ago

Note that above is using the *grant* models, which are trial-level, H=10, γ=0.9 models that seem to be trained pretty well. Also both the Belief and Value RNN models were tested with ε=0.04.

For ε=0, the weights are slightly different. The belief covariates (red/orange) have a stronger relative weight in both models; and for the Belief model, the t=0 weights are more similar

Q: Why does the pure Belief model show positive decoding for the previous choice?

mobeets commented 7 months ago

But the Value RNN is a little closer to the mouse than the Belief model if you allow considering a longer history (ε=0 below):

Again with ε=0.04:

Also, note that the RNN/Belief weights decay slower than the mouse. But I would guess that if we did a timestep-level model, the RNNs would decay faster (since integrating over that many trials would be much harder).

mobeets commented 7 months ago

Q: Why does the pure Belief model sometimes show positive decoding for the previous choice?

A: Note that the decoding model assumes a linear impact of covariates on action choice. But the Belief updates are nonlinear. For example, say b(t-1)=0.5. Then it's true that the covariate (choice==reward) determines b(t)--i.e., b(t) = w * b(t-1) for some fixed w, when b(t-1)=0.5, and c(t-1) == r(t-1). But when b(t-1) ≠ 0.5, this is no longer true. In that case, we need a separate weight for c(t-1) == r(t-1) = 0, vs. c(t-1) == r(t-1) = 1.

So I was essentially combining the wrong things in the plots above. Instead, the belief model would look for symmetric if we had choice==reward and choice!=reward. But maybe even better would just to be splitting things based on A, a, B, b.

Now we see some separation at longer lags (e.g., A > b), and also lots of variability. I think the variability makes sense actually for a pure belief model, because the weight changes based on what your previous belief was.

Repeating the analysis for the RNN (γ=0.2, H=10, trial-level) and Mouse:

mobeets commented 7 months ago

Now for timestep-level RNNs:

The RNNs were trained on 200 trials per episode (as compared to 800 for trial-level), and no abort penalty. Though including an abort penalty returned similar results:

mobeets / q-rnn

Beron summary #18