weidler / RLaSpa

Reinforcement Learning in Latent Space
MIT License
5 stars 1 forks source link

Framework adaptation #11

Closed adrigrillo closed 5 years ago

adrigrillo commented 5 years ago

I made some changes in the framework base and update everything to make it work

adrigrillo commented 5 years ago

I just realized that double deep policy are not workings because the target policy is not updated. We have to find a way to update it without modifying the framework. The method update could be valid if we include the iteration number as an argument or keeping a kind of counter.

The thing is to add something like this in the update method and after the training.

if iteration % 100 == 0:
    update_agent_model(current=current_model, target=target_model)
weidler commented 5 years ago

So the framework changes are fine, we just need to discuss if the last A in SARSA needs to be consistent between acting and updating. If not everything is good, if though, then we need to revert some of your changes to my former approach.

DQN Policy though is still not doing well as you indicated. We can do something like tracking the number of updates in the DQN maybe?