Open pathway opened 6 years ago
Thank you very much for your pointing out!
As you say, according to the pseudo code on the paper, I should reset the memory at the beginning of each episode. I'll fix it.
I used a simple memory game to test whether the code works, but it didn't converge. I thought it's a bug, but as you pointed out, the task might not be suitable for the agent. I'll try other environments such as a simple maze.
My understanding is Merlin targets "goal-directed behaviours". For example in the videos the agent repeatedly finds ways to a specific goal.
However in the memory game, the cards are shuffled each time, which does require memory but there is no sense of static "goal".
My question: Is merlin certainly applicable to the memory game, even though there is no known static goal state? In the code I see memory accumulating across episodes, but I think that means past memories (from previous episodes) are not useful.
Please forgive my ignorance if I am mistaken, also I know this is a work in progress. Your sharing is very much appreciated.