infer-actively / pymdp

A Python implementation of active inference for Markov Decision Processes
MIT License
450 stars 89 forks source link

B matrix navigation problem #77

Closed thanos927 closed 1 year ago

thanos927 commented 2 years ago

Hello,

Thanks so much with your work on bringing Active Inference to Python.

While going through your epistemic chaining demo, it appears there is a problem with the agent navigation. When I moved cue1 location from (2,0) to (0,1), the agent takes two moves downward and then tries to move "left" into the wall. The agent never recovers from this and doesn't seem to know to try a different direction.

I assume this is a problem with the B Matrix but I'm not smart enough to figure out if this is some challenge in the agent class or in the rules set up during the demo itself for (["UP", "DOWN", "LEFT", "RIGHT", "STAY"]); elif

Any help/advice would be greatly appreciated! Please see the output log from the agent movements below . The only change I make to your demo is the my_env section where I change the cue1_loc to (0,1) - you'll see that once it completes the second action it tries to go LEFT... then STAY, then tries LEFT a few more times:

Action at time 0: DOWN Grid location at time 0: (1, 0) Reward at time 0: Null Action at time 1: DOWN Grid location at time 1: (2, 0) Reward at time 1: Null Action at time 2: LEFT Grid location at time 2: (2, 0) Reward at time 2: Null Action at time 3: STAY Grid location at time 3: (2, 0) Reward at time 3: Null Action at time 4: STAY Grid location at time 4: (2, 0) Reward at time 4: Null Action at time 5: STAY Grid location at time 5: (2, 0) Reward at time 5: Null Action at time 6: LEFT Grid location at time 6: (2, 0) Reward at time 6: Null Action at time 7: LEFT Grid location at time 7: (2, 0) Reward at time 7: Null Action at time 8: LEFT Grid location at time 8: (2, 0) Reward at time 8: Null Action at time 9: LEFT Grid location at time 9: (2, 0) Reward at time 9: Null

conorheins commented 2 years ago

Hi @thanos927, I think your problem has to do with a common misunderstanding that comes up when building active inference agents, that concerns the difference between the generative model and the generative process. In what you've described, it sounds like you've only changed the generative process -- namely, the actual location of Cue 1. However, in order for the agent to accurately navigate to the new Cue 1 location, it also needs to have that knowledge built into its generative model, or its internal model of how the world works.

Specifically, you would need to bake in this knowledge about Cue 1's location into the agent's A matrix. This takes place in the section of the demo notebook called The observation model: A array and would require changing the value of the variable cue1_location as it's referred to in that part of the demo. In this demo I did not emphasize too strongly the distinction between the generative model and generative process, although other notebooks and content in our documentation explores that distinction more explicitly (see for instance the T-Maze Demo or T-Maze demo with learning).

I hope this helps! Conor