swan-utokyo / deir

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards
Apache License 2.0
19 stars 2 forks source link

I tried Montezuma Revenge Env and DEIR seems not working in this environment #4

Open Sino-Huang opened 5 months ago

Sino-Huang commented 5 months ago

Good day, I tested DEIR exploration algorithm, It indeed works well in some of the Minigrid environment as well as Crafter env (https://github.com/danijar/crafter)

However, when I tested Montezuma Revenge Env, DEIR algorithm fails to work properly.

May you give me some suggestions why this is the case?

swan-utokyo commented 5 months ago

Hi @Sino-Huang, Thank you for your interest in DEIR. I'm glad to know that you found it works well in Minigrid and Crafter environments.

Firstly, I want to clarify that during the design and experimentation of DEIR, we focused on procedurally generated RL benchmarks like Minigrid and ProcGen. This is because our ultimate goal was to develop an exploration method capable of effectively exploring unseen states in (dynamic) environments rather than training the agent to quickly find and learn a fixed sequence of actions that can maximize its returns in a static environment.

At other developers' requests, we also preliminarily experimented with DEIR and other baselines in Montezuma's Revenge. Although DEIR could obtain a total score of 2500 faster than others, none of them could exceed this score with our current architecture and hyperparameters, which were not originally designed and optimized for Atari games (please refer to issues https://github.com/swan-utokyo/deir/issues/1 and https://github.com/swan-utokyo/deir/issues/3 for more details). If you are interested, I suggest considering importing DEIR's intrinsic reward generation module to other architectures that are known to be able to master Atari games (and adjust the hyperparameters accordingly) for comprehensive testing and comparison.

Based on our analysis, one reason DEIR may fail in some environments is likely that we only generate intrinsic rewards based on the novelty of the state in the same episode ("episodic intrinsic reward"). In contrast, RND and NovelD generate intrinsic rewards based on a state's novelty across all episodes ("global/lifelong intrinsic reward").

I hope the above information is helpful to you. I apologize that we are not focusing on enhancing DEIR's performance in Atari games now and cannot provide more specific assistance. Please feel free to let us know if you have any new findings or questions. Thanks!

Sino-Huang commented 5 months ago

thanks for your explanation. Have a great day (🙏ˊᗜˋ*)

xuyangthu66 commented 3 months ago

Good day. I am an undergraduate student who is new to reinforcement learning. I would like to ask how to apply the code to the Montezuma's Revenge environment. Is it enough to just change the env_name in config?

swan-utokyo commented 3 months ago

@xuyangthu66 Thanks for your question. If you want to test DEIR in Atari, you may try with the options we provided in https://github.com/swan-utokyo/deir/issues/1#issuecomment-1765758274 and this experimental branch. However, as mentioned in previous discussions, please note that Atari is not one of our target environments, and our code and parameters for it are not fully optimized, so we don't guarantee the performance of DEIR in Atari games.