swan-utokyo / deir

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards
Apache License 2.0
16 stars 1 forks source link

I tried Montezuma Revenge Env and DEIR seems not working in this environment #4

Open Sino-Huang opened 1 month ago

Sino-Huang commented 1 month ago

Good day, I tested DEIR exploration algorithm, It indeed works well in some of the Minigrid environment as well as Crafter env (https://github.com/danijar/crafter)

However, when I tested Montezuma Revenge Env, DEIR algorithm fails to work properly.

May you give me some suggestions why this is the case?

swan-utokyo commented 1 month ago

Hi @Sino-Huang, Thank you for your interest in DEIR. I'm glad to know that you found it works well in Minigrid and Crafter environments.

Firstly, I want to clarify that during the design and experimentation of DEIR, we focused on procedurally generated RL benchmarks like Minigrid and ProcGen. This is because our ultimate goal was to develop an exploration method capable of effectively exploring unseen states in (dynamic) environments rather than training the agent to quickly find and learn a fixed sequence of actions that can maximize its returns in a static environment.

At other developers' requests, we also preliminarily experimented with DEIR and other baselines in Montezuma's Revenge. Although DEIR could obtain a total score of 2500 faster than others, none of them could exceed this score with our current architecture and hyperparameters, which were not originally designed and optimized for Atari games (please refer to issues https://github.com/swan-utokyo/deir/issues/1 and https://github.com/swan-utokyo/deir/issues/3 for more details). If you are interested, I suggest considering importing DEIR's intrinsic reward generation module to other architectures that are known to be able to master Atari games (and adjust the hyperparameters accordingly) for comprehensive testing and comparison.

Based on our analysis, one reason DEIR may fail in some environments is likely that we only generate intrinsic rewards based on the novelty of the state in the same episode ("episodic intrinsic reward"). In contrast, RND and NovelD generate intrinsic rewards based on a state's novelty across all episodes ("global/lifelong intrinsic reward").

I hope the above information is helpful to you. I apologize that we are not focusing on enhancing DEIR's performance in Atari games now and cannot provide more specific assistance. Please feel free to let us know if you have any new findings or questions. Thanks!

Sino-Huang commented 1 month ago

thanks for your explanation. Have a great day (🙏ˊᗜˋ*)