zkytony / cos-pomdp

Code for "Towards Optimal Correlational Object Search" | ICRA 2022
Apache License 2.0
13 stars 5 forks source link

Agent getting stuck in infinite loops with Stay [with perfect detector] #12

Closed RajeshDM closed 4 months ago

RajeshDM commented 4 months ago

The agent, when trying to go to the alarm clock gets stuck in a loop - it keeps finding stay as the best action when it's far away.

Here are the potential issues :

  1. https://github.com/zkytony/cos-pomdp/blob/1e3c7fe940a2165b5c5c384ebf047e8139d437ec/cospomdp_apps/thor/agent/components/policy_model.py#L124

Assume you are at location 0. The object is at location 1. So while doing MCTS simulation (in the abstract planning), it does move(0->1). Then it is looking for the next action. Then since it has an action prior, it tries to pick one among the 2 move options - Move(1->0) or stay(1->1). Now, Move (1-> 0) is rejected because it is taking the agent farther. But Stay (1-> 1) succeeds, is added to preferences, and the action picking code - https://github.com/zkytony/cos-pomdp/blob/1e3c7fe940a2165b5c5c384ebf047e8139d437ec/cospomdp/models/policy_model.py#L51 . Since there is only 1 action in preference, Stay (1->1), that is selected. When it tries to do next step planning - since Stay(1->1) doesn't change the state in any way, this process will keep getting repeated until max depth is reached and no reward is found in this path. So a plan with (Move 0->1) + Stay (1->1) will never have the opportunity to call done.

So I guess a check needs to be added that if we already took a stay action, we should do something else most likely.

Other potential issue:

Take the same case as above - Agent at 0, does a Move (0->1), then stay(1->1), no observation is found so a random action is picked Stay(1->1) is picked randomly from https://github.com/zkytony/cos-pomdp/blob/1e3c7fe940a2165b5c5c384ebf047e8139d437ec/cospomdp/models/policy_model.py#L53 , and then done is called later, it will not get the reward unless the agent was already looking in the correct direction( for the reward, it considers pitch and yaw - even for the upper level planning) - and in the case the agent isn't looking in the correct direction (which the upper level planner has no control over), the correct action sequence of (Move 0>-1), Stay (1->1) and done will not get any reward.

So it might be better to have a separate reward function for high level and low level planning. It looks like at the moment, both are using the same reward function.

RajeshDM commented 4 months ago

I think I fixed the issue - I wrote a separate reward function for high level planning that gives reward as long as agent is close to the object and I added the condition that 2 stay actions cannot be called consecutively. It is no longer getting stuck in any infinite loop for anything I have tested.

It's having an issue of going close to the object and not looking down anymore but I think that is a separate issue.

zkytony commented 4 months ago

Didn’t have time to look closely, but just wanted to remind you that you’re making it a dense reward which means you may be mixing the solution in your mind to what the agent should output. The original reward is sparse, only represents the task. Just wanted to note that what you changed introduces a fundamental difference.

RajeshDM commented 4 months ago

So the change I made was in the success function - the condition for success is that the agent must just be in the range of the object of interest as opposed to looking at it with perfect pitch and yaw for the high-level planning to get its reward.

It still only gets a reward after calling the done action - so I think it still remains a sparse reward - only getting a reward when it thinks it's called done.