learningsimulator / learningsimulator

Learning Simulator: A simulation software for animal and human learning
https://www.learningsimulator.org
MIT License
12 stars 0 forks source link

Investigate why v-value decreases in this script #87

Open markusrobertjonsson opened 3 years ago

markusrobertjonsson commented 3 years ago

In the below script the subject experiences A. At every other "response" it is either rewarded (+5) or punished (-5). At "no_response" it gets zero reinforcement. Plotting v(A->response) shows decreasing v-values. Why is this? If every other time (on average) there is reward and every other punishment, shouldn't on average the v-value remain constant?

n_subjects        = 1000
mechanism         = sr
behaviors         = response, no_response
stimulus_elements = A, reward, punishment, neutral
start_v           = 0
alpha_v           = 0.2
beta              = 1
u                 = reward:1, punishment:-1, default:0
behavior_cost     = 0

@PHASE small stop: START_TRIAL=500
START_TRIAL A          | response: SHIFTY | NEUTRAL
SHIFTY                 | choice(1,2)==1: REWARD | PUNISHMENT
NEUTRAL     neutral    | START_TRIAL
REWARD      reward     | START_TRIAL
PUNISHMENT  punishment | START_TRIAL

@run small

# Why isn't v(A->response) fluctuating around zero? Why does it go down?

@figure
xscale = START_TRIAL
subject = average
@vplot A->response
@vplot A->no_response
@legend
markusrobertjonsson commented 2 years ago

It seems that it is more clearly seen with more subjects. This was produced with more and more subjects, and the script was changed to give reward exactly every other time:

n_subjects        = 1000
mechanism         = sr
behaviors         = response, no_response
stimulus_elements = A, reward, punishment, neutral
start_v           = 0
alpha_v           = 0.2
beta              = 1
u                 = reward:1, punishment:-1, default:0
behavior_cost     = 0

@PHASE small stop: START_TRIAL=500
S                      | cnt=0, START_TRIAL
START_TRIAL A          | response: SHIFTY | NEUTRAL
# SHIFTY                 | choice(1,2)==1: REWARD | PUNISHMENT
SHIFTY                 | cnt==0: REWARD | PUNISHMENT
NEUTRAL     neutral    | START_TRIAL
REWARD      reward     | cnt=1, START_TRIAL
PUNISHMENT  punishment | cnt=0, START_TRIAL

@run small

# Why isn't v(A->response) fluctuating around zero? Why does it go down?

# Just to check that number of rewards is same as number of punishments
@figure
@nplot reward
@nplot punishment
@legend

@figure
xscale = START_TRIAL
subject = average
@vplot A->response
@vplot A->no_response
@legend

knasbas

markusrobertjonsson commented 2 years ago

Explanation: When v(S->response) is low, it is less likely to respond the next time (i.e. more likely to do no_response which gives reinforcement 0). So the v-plot has a tendency to "stay down" when it is down, but there is no corresponding tendency to stay up when it is up. Plotting the average of 1000 subjects, this effect is visible.