Inconsistency between the models for training and testing.

I am re-implementing your interesting work. I have some problems on the Montezuma's Revenge task. During training, in run_hybrid_atari_experiment.py, you used Hdqn(GPU) as the subgoal network, but for testing, in test_model.py, you used another network architecture Net() as the subgoal network. Why are they not consistent? Could you please upload the trained weights and the code for using Hdapn(GPU) in testing?

Also, I notice that in testing, the trained meta controller is actually not used. Instead, the subgoals are manually set and each subgoal is achieved by a simple_net, which seems not surpassing a supervised method that using imitation learning to learn to achieve each fixed subgoal under a fixed environment. Could you explain the generalizability of the method? Thanks!

hoangminhle / hierarchical_IL_RL

Inconsistency between the models for training and testing. #3