jurgisp / memory-maze

Evaluating long-term memory of reinforcement learning algorithms
MIT License
129 stars 13 forks source link

Does the target location chage when the reset method is called? #10

Closed hydro-man closed 1 year ago

hydro-man commented 1 year ago

Thank you for your anwsering my last issue. You tell me that when I call the reset method, the maze layout will be changed. So I want know Does the target location and the maze layout change together in every episode?

And, in your discription, the agent is prompted to find the target object of a specific color, indicated by the border color in the observation image. I don't clear the relationship between the target object color and the border color. The border has different color. How the agent find the specific color target according to the color of the border?

jurgisp commented 1 year ago

Yes, on every episode the maze layout and the locations of objects are different. But the colors of objects are the same - eg in 9x9 maze there are always 3 objects: red, green and blue.

Regarding the prompt: the border color always indicates what color object the agent is supposed to find. If the border is red, then the agent must look for red object, and gets reward when touching it. So during course of training, over many episodes, the agent learns to associate border color with object color, and does not need to rediscover this in every episode.

On Fri, 13 Jan 2023 at 18:40, hydro-man @.***> wrote:

Thank you for your anwsering my last issue. You tell me that when I call the reset method, the maze layout will be changed. So I want know Does the target location and the maze layout change together in every episode?

And, in your discription, the agent is prompted to find the target object of a specific color, indicated by the border color in the observation image. I don't clear the relationship between the target object color and the border color. The border has different color. How the agent find the specific color target according to the color of the border?

— Reply to this email directly, view it on GitHub https://github.com/jurgisp/memory-maze/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAX5NC6RODCEERUUI3SRPP3WSGAP7ANCNFSM6AAAAAAT2TEGBE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hydro-man commented 1 year ago

Thank you for your reply. I meet another question in using your environment. How to set the random seed to ensure the maze layout consistent in tow experiments? Is the way to set seed setting the parameter 'seed = 0' when calling gym.make?

jurgisp commented 1 year ago

Good question - not at the moment. I have created a separate issue. It's a relatively small task, because the underlying dm_env environment already supports seed, it just needs to be wired up.

If you want to fix it to seed=0, you could just fork and set it here.