Beron DA results - Githubissues

mobeets commented 6 months ago

Task

Notes:

the ITI is uniform over five different durations. We should do this, instead of geometric

Behavior

Screenshot 2024-02-13 at 1 47 39 PM Screenshot 2024-02-13 at 1 52 24 PM

Decision times are slightly longer following switches (left panel). Mice were more likely to abort on trials with longer ITIs (right panel).

DA activity

DA on rewarded, unrewarded, no-response ("timeout") trials; also aborts (right panel): Screenshot 2024-02-13 at 1 53 18 PM

Notes:

you can see separate responses to the go cue versus the choice. We might want to enforce a separation in our experiment so we can plot the same. To do this, we would set reward_delay = 1.
the DA activity decreases during the ITI. (I think this makes perfect sense from an RPE perspective: the ITI is uniform, so the longer you wait, the closer you are getting to the ITI (in expectation), so value is larger, and thus the absence of a reward means the RPE gets smaller.)

mobeets commented 6 months ago

DA as a function of reward history

Screenshot 2024-02-13 at 1 57 56 PM Screenshot 2024-02-13 at 1 58 05 PM

Note that the reward history is quantified as "the number of rewards in the previous 3 trials", so dark red = 3 unrewarded, and dark blue = 3 rewarded.

Supposedly this is consistent with the RPE signal δ = ηR - Q, where η is an arbitrary scalar.

mobeets commented 6 months ago

Results for H=10 RNN, iti=2-7, abortpenalty=-0.1, no reward delay:

And again for a model with reward delay=1:

The thing these models don't seem to capture is the RPE in response to the cue being larger for reward history of -3 than for 3. these models show the opposite.

Finally, reward delay=1, jitter=1 (so reward delay is either 1 or 2):

mobeets / q-rnn

Beron DA results #19

Task

Behavior

DA activity

DA as a function of reward history