hoangminhle / hierarchical_IL_RL

Code for hierarchical imitation learning and reinforcement learning
282 stars 73 forks source link

Running the code #1

Open nanxintin opened 6 years ago

nanxintin commented 6 years ago

Hi, This is a meaningful work to combine imitation learning and reinforcement learning in a hierarchical architecture to solve Montezuma’s Revenge. I successfully run the code to train hybrid_rl_il_agent. When I test the well-trained model, I find the agent makes the same actions every episode. It seems that the agent follows a completely fixed trajectory to play the game without some adaptation. Is this a good strategy for the agent? And then I want to train the h-DQN agent as a comparison, but I cannot find the right code to do this. Can you give me some advice to start the training? Thanks.

hoangminhle commented 6 years ago

hi there, regarding fixed trajectory: this is due to the arcade learning environment (ALE) largely being deterministic, and the subgoal policies learned are also deterministic (it is a variant of double deep Q learning for each subgoal). Doesn't mean that it is a bad strategy. Of course you could swap it with some other stochastic policies for the lower-level policies.

Regarding h-DQN baseline comparison: Let me clean up my baseline code and I will put them up as well. The summary is that it mostly doesn't learn anything useful for games like Montezuma's Revenge.

moonsh commented 4 years ago

@nanxintin Did you use python 2 to run the code?

nanxintin commented 4 years ago

@moonsh I'm sorry that I can not remember yet.

hoangminhle commented 4 years ago

Yes I did use python 2.7 to run the code back then