mobeets / q-rnn

0 stars 0 forks source link

Beron DA results #19

Open mobeets opened 6 months ago

mobeets commented 6 months ago

Task

Screenshot 2024-02-13 at 1 46 49 PM

Notes:

Behavior

Screenshot 2024-02-13 at 1 47 39 PMScreenshot 2024-02-13 at 1 52 24 PM

Decision times are slightly longer following switches (left panel). Mice were more likely to abort on trials with longer ITIs (right panel).

DA activity

DA on rewarded, unrewarded, no-response ("timeout") trials; also aborts (right panel): Screenshot 2024-02-13 at 1 53 18 PMScreenshot 2024-02-13 at 1 53 06 PM

Notes:

mobeets commented 6 months ago

DA as a function of reward history

Screenshot 2024-02-13 at 1 57 56 PMScreenshot 2024-02-13 at 1 58 05 PM

Note that the reward history is quantified as "the number of rewards in the previous 3 trials", so dark red = 3 unrewarded, and dark blue = 3 rewarded.

Supposedly this is consistent with the RPE signal δ = ηR - Q, where η is an arbitrary scalar.

mobeets commented 6 months ago

Results for H=10 RNN, iti=2-7, abortpenalty=-0.1, no reward delay:

And again for a model with reward delay=1:

The thing these models don't seem to capture is the RPE in response to the cue being larger for reward history of -3 than for 3. these models show the opposite.

Finally, reward delay=1, jitter=1 (so reward delay is either 1 or 2):