I tried Montezuma Revenge Env and DEIR seems not working in this environment

Sino-Huang commented 5 months ago

Good day, I tested DEIR exploration algorithm, It indeed works well in some of the Minigrid environment as well as Crafter env (https://github.com/danijar/crafter)

However, when I tested Montezuma Revenge Env, DEIR algorithm fails to work properly.

May you give me some suggestions why this is the case?

swan-utokyo commented 5 months ago

Hi @Sino-Huang, Thank you for your interest in DEIR. I'm glad to know that you found it works well in Minigrid and Crafter environments.

Firstly, I want to clarify that during the design and experimentation of DEIR, we focused on procedurally generated RL benchmarks like Minigrid and ProcGen. This is because our ultimate goal was to develop an exploration method capable of effectively exploring unseen states in (dynamic) environments rather than training the agent to quickly find and learn a fixed sequence of actions that can maximize its returns in a static environment.

At other developers' requests, we also preliminarily experimented with DEIR and other baselines in Montezuma's Revenge. Although DEIR could obtain a total score of 2500 faster than others, none of them could exceed this score with our current architecture and hyperparameters, which were not originally designed and optimized for Atari games (please refer to issues https://github.com/swan-utokyo/deir/issues/1 and https://github.com/swan-utokyo/deir/issues/3 for more details). If you are interested, I suggest considering importing DEIR's intrinsic reward generation module to other architectures that are known to be able to master Atari games (and adjust the hyperparameters accordingly) for comprehensive testing and comparison.

Based on our analysis, one reason DEIR may fail in some environments is likely that we only generate intrinsic rewards based on the novelty of the state in the same episode ("episodic intrinsic reward"). In contrast, RND and NovelD generate intrinsic rewards based on a state's novelty across all episodes ("global/lifelong intrinsic reward").

For example, there are multiple states in Montezuma's Revenge that are important and must be explored repeatedly, but they have very limited observational novelty (e.g., when the agent needs to stay still for a period of time to wait for an obstacle to disappear, with almost no observational changes during the period). In lifelong intrinsic reward methods, after all other states have been fully explored, even these states with limited novelty can still be assigned high intrinsic rewards and be explored eventually. However, in episodic intrinsic reward methods, these states might not be able to be fully explored since their novelties are always negligible compared to others in the same episode.
The paper of NGU actually provides an effective solution, which is to combine episodic and lifelong intrinsic rewards. However, due to the scale of this project, we only implemented the episodic reward for DEIR in our current code.

I hope the above information is helpful to you. I apologize that we are not focusing on enhancing DEIR's performance in Atari games now and cannot provide more specific assistance. Please feel free to let us know if you have any new findings or questions. Thanks!

Sino-Huang commented 5 months ago

thanks for your explanation. Have a great day (🙏ˊᗜˋ*)

xuyangthu66 commented 3 months ago

Good day. I am an undergraduate student who is new to reinforcement learning. I would like to ask how to apply the code to the Montezuma's Revenge environment. Is it enough to just change the env_name in config?

swan-utokyo commented 3 months ago

@xuyangthu66 Thanks for your question. If you want to test DEIR in Atari, you may try with the options we provided in https://github.com/swan-utokyo/deir/issues/1#issuecomment-1765758274 and this experimental branch. However, as mentioned in previous discussions, please note that Atari is not one of our target environments, and our code and parameters for it are not fully optimized, so we don't guarantee the performance of DEIR in Atari games.

swan-utokyo / deir

I tried Montezuma Revenge Env and DEIR seems not working in this environment #4