SARSA or Fitted-Q iteration or DQN or LSTD

DQN is a form of FQI. What is implemented is a form of FQI too in the sense that a target is computed then regressed. Only difference compared to DQN is the use of the Bellman operator instead of the Bellman optimality operator (with the argmax), which is what we need for policy evaluation. Using DQN might work but it wouldn't be in the mirror descent setting we're following, so needs more investigation. LSTD would be a good idea although it is only for training the linear part, so maybe only using it to fine tune the linear part after features are learned?

riccardodv / MirrorRL

SARSA or Fitted-Q iteration or DQN or LSTD #3