jurgisp / memory-maze

Evaluating long-term memory of reinforcement learning algorithms
MIT License
129 stars 13 forks source link

Show Cue only for N frames #23

Closed subho406 closed 1 year ago

subho406 commented 1 year ago

Hey, I am working on a modified Memory Maze setting, where the goal is to remember the cue signal instead of the maze layout. My goal is to have an environment where the maze layout is provided as input to the agent along with the current observation. The cue signal, which is currently shown as border in the observation at all times, should be shown only for N frames when the cue changes. I was thinking of using the oracle wrappers, but modify it in a way that the cue becomes white after N frames. Do you have any suggestions on how I could make this work in the current environment? or do you have a wrapper that already does this?

jurgisp commented 1 year ago

That should be fairly easy to implement.

First, have a look at https://github.com/jurgisp/memory-maze/blob/main/memory_maze/__init__.py, how the different variants are constructed and here https://github.com/jurgisp/memory-maze/blob/main/memory_maze/tasks.py#L50 for the full set of flags.

In particular, you will want something like:

env = tasks.memory_maze_9x9(image_only_obs=False, target_color_in_image=False, global_observables=False)
env = GymWrapper(env)

This will create an environment which returns a dictionary of observations which has all the info you might want, and does not automatically draw the target color as a border. The target color will be indicated in the field target_color.

Then you can write your own wrapper which will display the target_color for a limited number of steps.

subho406 commented 1 year ago

Thanks @jurgisp, this is super helpful ! I had question btw. I'm trying to provide the maze layout information to the agent as a part of the observation. My goal is to isolate the goal of learning to remember a cue signal for long durations from the complexities introduced by exploration (by having to figure out the maze layout). Having the need to do initial exploration in every episode is making learning difficult at the moment. The model-free Transformer-XL agent I am testing right now often resorts to learning short context relationships rather that utilizing its long context.

I don't fully understand the differences between the three oracle settings listed in this file: https://github.com/jurgisp/memory-maze/blob/main/memory_maze/__init__.py Which of the Oracle settings should make it easy for the agent to utilize the maze layout while still having an FPS observation?

subho406 commented 1 year ago

Here is a Plot of Asynchronous PPO+Transformer-XL with different Memory Sizes (number of past activations used to generate a representation in a given timestep) in Memory Maze. It seems that context length does not have any impact on the performance, suggesting the agent is often resorting to learning suboptimal solutions and might not be utilizing its full context. image image

subho406 commented 1 year ago

Also, do you know a way to not randomize the maze in every episode?

subho406 commented 1 year ago

I was able to modify the environment with your suggestions. Thanks again for being super helpful! I created a new issue for the second use-case. I will share the Wrapper soon, so that others can use it.

jurgisp commented 1 year ago

Great, glad it was helpful!

Also, interesting results with PPO + GTrXL! It still seems worse than Dreamer, right? But does it exceed recurrent PPO?

Closing this issue as the it's a bit off-topic, maybe better to continue discussion in #17 or in a new dedicated issue :)