HGF Multi-armed bandit - Githubissues

filippoferrari commented 1 year ago

Dear Dr. Mathys,

I am trying to apply the hgf_binary_mab model to a 2-armed bandit task with independent rewards and punishments (a similar setup to Pulcu et al, 2017, eLife).

As suggested in the documentation, I am first testing the model with the Bayesian optimal response model

est = tapas_fitModel([], u, 'tapas_hgf_binary_mab_config', 'tapas_bayes_optimal_binary_config');

but the lack of responses y causes the model to crash in tapas_hgf_binary_mab.m at line 70 (https://github.com/translationalneuromodeling/tapas/blob/master/HGF/tapas_hgf_binary_mab.m#L70).

Could you provide any help with this?

Thanks, Filippo

timothysandhu commented 1 year ago

Hi Filippo,

Sorry that this isn't a complete answer (and may not even be right!), but from having recently started to play with this model myself, I noted the following.

From looking at the _binary_mab_config file relative to the _binary_config file, it seems like the trajectory variables have an extra dimension for 'bandit'. Further, looking in _binary_mab relative to _binary, it looks as if at certain time steps, updates are only performed on one of the bandits (I assume the chosen one). This led me to think that y here might be an indication of which bandit corresponds to the respective input at a given time, which made some sense as I don't know how else one would be able to model belief updating without including some information about which bandit was selected on which trial.

I tried using the attached y and u (hgf_binary_mab.zip), and was able to fit the model, and the output looked (sort of) sensible in that

when reward wasn't expected at bandit X (input = 0) but bandit X was used on that trial, the estimated reward expectation decreased
when a bandit X wasn't used on trial T (response colour not bandit X colour), the reward expectation didn't change

Code to produce the plot:

fit = tapas_fitModel(y,... u,... 'tapas_hgf_binary_mab_config',... 'tapas_bayes_optimal_binary_config',... 'tapas_quasinewton_optim_config'); tapas_hgf_binary_mab_plotTraj(fit)

hgf_binary_mab

My confusion comes from what the input actually means in this case (i.e. is it that the bandit was rewarded?), and what we would actually be fitting with the softmax_mu3 response model for instance, especially as we don't seem to be able to simulate responses as there is no _binary_mab_namep function.

Hopefully Chris (or someone with more knowledge than me!) can help!

Hope this helps, and let me know if you can help with my confusion!

Best,

Tim

filippoferrari commented 11 months ago

Hi Tim,

Yes, the model expects a y input as it needs to know which bandit to update.
This makes simulating the model quite tricky since the HGF models first update its beliefs for all trials and then it simulates responses based on that. In this multi-armed bandit scenario you'd need to have your responses available before updating the HGF beliefs, which may be unrealistic depending on your experiment. There's now a multi-armed bandit implementation in the pyhgf package which simulates responses at each trial and then updates beliefs accordingly.

Regarding the _namep function, I added one in this pull request (https://github.com/translationalneuromodeling/tapas/pull/245), together with some changes to make the hgf_binary_mab model accept inputs in the same way as the autoregressive mab model does.

Best, Filippo

translationalneuromodeling / tapas

HGF Multi-armed bandit #243