yosider / merlin

(Personal experiment) Unsupervised Predictive Memory in a Goal-Directed Agent https://arxiv.org/abs/1803.10760
24 stars 5 forks source link

Goal directed #2

Open pathway opened 6 years ago

pathway commented 6 years ago

My understanding is Merlin targets "goal-directed behaviours". For example in the videos the agent repeatedly finds ways to a specific goal.

However in the memory game, the cards are shuffled each time, which does require memory but there is no sense of static "goal".

My question: Is merlin certainly applicable to the memory game, even though there is no known static goal state? In the code I see memory accumulating across episodes, but I think that means past memories (from previous episodes) are not useful.

Please forgive my ignorance if I am mistaken, also I know this is a work in progress. Your sharing is very much appreciated.

yosider commented 6 years ago

Thank you very much for your pointing out!

As you say, according to the pseudo code on the paper, I should reset the memory at the beginning of each episode. I'll fix it.

I used a simple memory game to test whether the code works, but it didn't converge. I thought it's a bug, but as you pointed out, the task might not be suitable for the agent. I'll try other environments such as a simple maze.