Open mobeets opened 6 months ago
Note that the reward history is quantified as "the number of rewards in the previous 3 trials", so dark red = 3 unrewarded, and dark blue = 3 rewarded.
Supposedly this is consistent with the RPE signal δ = ηR - Q, where η is an arbitrary scalar.
Results for H=10 RNN, iti=2-7, abortpenalty=-0.1, no reward delay:
And again for a model with reward delay=1:
The thing these models don't seem to capture is the RPE in response to the cue being larger for reward history of -3 than for 3. these models show the opposite.
Finally, reward delay=1, jitter=1 (so reward delay is either 1 or 2):
Task
Notes:
Behavior
Decision times are slightly longer following switches (left panel). Mice were more likely to abort on trials with longer ITIs (right panel).
DA activity
DA on rewarded, unrewarded, no-response ("timeout") trials; also aborts (right panel):
Notes:
reward_delay = 1
.