riccardodv / MirrorRL

2 stars 0 forks source link

SARSA or Fitted-Q iteration or DQN or LSTD #3

Open AleShi94 opened 2 years ago

AleShi94 commented 2 years ago

https://github.com/riccardodv/MirrorRL/blob/b7830390561630ca33fc8c4563d4ec45895a28a2/cascade_mirror_rl_fqi.py#L69-L72

It seems like this piece of code corresponds more to SARSA method as we use next actions in the trajectory in order to compute new target q-value. Do we have reasons to use SARSA as a way to fit approximation of Q?

Should we try other approaches such as Fitted Q iteration, LSTD or DQN? Do you know what to choose in which situation?

akrouriad commented 2 years ago

DQN is a form of FQI. What is implemented is a form of FQI too in the sense that a target is computed then regressed. Only difference compared to DQN is the use of the Bellman operator instead of the Bellman optimality operator (with the argmax), which is what we need for policy evaluation. Using DQN might work but it wouldn't be in the mirror descent setting we're following, so needs more investigation. LSTD would be a good idea although it is only for training the linear part, so maybe only using it to fine tune the linear part after features are learned?