zkytony / cos-pomdp

Code for "Towards Optimal Correlational Object Search" | ICRA 2022
Apache License 2.0
13 stars 5 forks source link

Ground truth object search failing even when the object has been seen #9

Open RajeshDM opened 6 months ago

RajeshDM commented 6 months ago

I am trying to run the system in ground truth mode - no correlation or or vision detector being used to see approximate best amount of time it takes to search for an object.

I ran the following test script. The system consistently ends up going near the cellphone but looks up in the end and says target not found

import os
import sys

ABS_PATH = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.join(ABS_PATH, "../../experiments/thor"))

import thortils as tt

from experiment_thor import (Methods,
                             make_trial,
                             OBJECT_CLASSES,
                             read_detector_params)

def prepare(scene_type):
    detector_models = read_detector_params()
    targets = OBJECT_CLASSES[scene_type]["target"]
    corr_objects = OBJECT_CLASSES[scene_type]["corr"]
    return detector_models, targets, corr_objects

def _test_method(method, scene_type, target_class, scene="FloorPlan21"):
    valscenes = tt.ithor_scene_names(scene_type, levels=range(21, 31))
    if scene not in valscenes:
        raise ValueError("Only allow validation scenes.")

    detector_models, targets, corr_objects = prepare(scene_type)
    if target_class not in targets:
        raise ValueError("{} is not a valid target class".format(target_class))

    trial = make_trial(method, 0, scene_type, scene, target_class,
                       detector_models, corr_objects=corr_objects,
                       visualize=True)
    trial.run()

if __name__ == "__main__":
    room_type = 'bedroom'
    goal_obj_type = 'CellPhone'
    scene = 'FloorPlan321'
    _test_method(Methods.GT_HIERARCHICAL_TARGET, room_type, goal_obj_type, scene=scene)
zkytony commented 6 months ago

To investigate, you can try printing out the agent’s belief as it approaches. See how the belief about the object’s height changes especially when approaching.

I don’t remember exactly, but it’s possible that the belief over the object’s height is uniform even in groundtruth mode.

I suspect that when the agent gets too close to the cellphone, the cellphone is actually not in FOV, so the agent doesn’t get observations about the target. As a result might update its belief about the object’s height and think it needs to look up. I am not sure why it consistently looks up though.

RajeshDM commented 6 months ago

To investigate, you can try printing out the agent’s belief as it approaches. See how the belief about the object’s height changes especially when approaching.

I don’t remember exactly, but it’s possible that the belief over the object’s height is uniform even in groundtruth mode.

I suspect that when the agent gets too close to the cellphone, the cellphone is actually not in FOV, so the agent doesn’t get observations about the target. As a result might update its belief about the object’s height and think it needs to look up. I am not sure why it consistently looks up though.

So the height belief oscillates between same and below as it approaches the object of interest. Yes, you are right in that when it is quite close the object is not in the FOV. I would, however, expect it to look down as the probability is higher for the object being below as compared to above and not call done when looking up.

Is there any way to debug this in a more formal way? I have used the TreeDebugger you pointed me to and it's a great tool. But it is hard to debug all possible paths in monte carlo simulations as there are 100s of simulations and and each simulation has many steps and each step has it's own tree.

Is there some simple way to store the tree at each step of the simulation? To see what made it decide to take an action at a particular step (basically looking back after execution of the steps and hence only need to look at 1 tree instead of 100s of trees)

zkytony commented 6 months ago

Not sure if I totally follow, but you can do ‘dd.mbp’ which highlights the preferred / best-case scenario.

I would specifically look at the belief state when the target is in FOV (not just the image, but also the sensor model), and the step afterwards where it goes out of FOV. And indeed I would use the debugger and check if the agent gets rewarded after it looks down — it’s possible that in this particular scene, for some reason, when the agent gets too close to the target, the target just won’t be in the FOV of the sensor model. I know it’s weird but it’s possible for Ai2Thor scenes.

zkytony commented 6 months ago

It would be good to also check what sequence of actions and observations drives the agent towards the target. This can be marked in the debugger with dd.mbp.

RajeshDM commented 6 months ago

My understanding is, please correct me if I'm wrong, The Treedebugger, whenever initialized has the tree from the current step - so if we have taken 4 steps already in the environment and I initialize the debugger - it will show the best actions to take from this point on and we would not know why the previous 4 steps were taken.

Since at any given time, we run 100+ simulations, we have 100+ trees to look at and this makes the debugging hard to see what future action the agent is going to take from the current state.

Is this is the case or am I missing something?

zkytony commented 6 months ago

This is not quite right:

Since at any given time, we run 100+ simulations, we have 100+ trees to look at

There is only one tree. The rest is correct. I’d say for the most part, at a given time, I only care about what the agent will do next (planning). I don’t care about what it has done before.

RajeshDM commented 6 months ago

This is not quite right:

Since at any given time, we run 100+ simulations, we have 100+ trees to look at

There is only one tree. The rest is correct. I’d say for the most part, at a given time, I only care about what the agent will do next (planning). I don’t care about what it has done before.

Yes there is only 1 true tree - but at any given point there are multiple paths that can be taken from there and each has it's own potential tree - so if I want to look at what the belief state is during different possible monte carlo simulations, there would be multiple possible trees amongst which only one will be the real tree based on the action taken right?

zkytony commented 6 months ago

Each simulation particle travels down the tree along a path. Not a tree.

RajeshDM commented 6 months ago

Each simulation particle travels down the tree along a path. Not a tree.

Oh right, there is only 1 tree and all the branches are the multiple simulations.

Just a clarification about the way the tree is stored - as soon as one action is taken, the next node corresponding to the action is made the root node right (and basically pruning every other simulation once an action has been taken)

zkytony commented 6 months ago

Yes, but only if the ha node (i.e. QNode) has a child observation that matches the real one.

RajeshDM commented 5 months ago

I believe I may have found the issue - It is in the belief update: https://github.com/zkytony/cos-pomdp/blob/1e3c7fe940a2165b5c5c384ebf047e8139d437ec/cospomdp_apps/thor/agent/components/belief.py#L119

It says - if the object is not visible then it must be above or below and if the object is visible it is same.

There's multiple potential pitfalls that I see:

  1. Consider the scenario when the object is at a lower height but far so it is visible without looking down.

a. When the agent is far and sees the object, it's belief that object is at the same height increases very strongly even thought technically it should be increasing the belief about lower height and not same height

b. Now when the agent gets close to the object, it no longer sees it so belief about the object being higher or lower increases. This is what leads to the potential of the agent looking up - because it's belief of object being above is higher than it should be.

There is another part of the code that I am not sure I fully understand

https://github.com/zkytony/cos-pomdp/blob/1e3c7fe940a2165b5c5c384ebf047e8139d437ec/cospomdp_apps/thor/agent/components/belief.py#L128

Here, both the if and else are checking if current pitch is maximum (checking if pitch == 330) because that is the max in thor_v_angles. It looks like elif is also checking for that? is this a typo or a particular expected behaviour that needs this to happen?

zkytony commented 5 months ago

Consider the scenario when the object is at a lower height but far so it is visible without looking down.

Assuming "far" is still within detectable range by the sensor model, then the agent would receive observation of the object, update its belief (doesn't have to look down). This seems to match what you are describing. This behavior is demonstrated in the example shown in 0:27-1:00 of the video.

This is what leads to the potential of the agent looking up - because it's belief of object being above is higher than it should be.

Isn't this fine though? The agent would look up, didn't see it there, and increase the belief that the object is lowere, and eventually look down. (I believe the sensor model does account for tilt of the sensor even though the FOV is 2D -- to the extent I could implement).

Here, both the if and else are checking if current pitch is maximum (checking if pitch == 330)

That must have been a typo... I believe I meant to say min() for the elif case.

RajeshDM commented 5 months ago

Consider the scenario when the object is at a lower height but far so it is visible without looking down.

Assuming "far" is still within detectable range by the sensor model, then the agent would receive observation of the object, update its belief (doesn't have to look down). This seems to match what you are describing. This behavior is demonstrated in the example shown in 0:27-1:00 of the video.

Ok, maybe I missed something. If the height belief says the object is same, I would assume it needs no action in terms of up or down is required. As soon as the agent gets close, it no longer has any idea where the object now is - in terms of height. We are losing information about the height we had until now in this case or does the height belief remain the same when the object is not seen? (from what I'm understanding, if the object has been seen when it's down maybe, the agent looks up/down in a different direction and does not see the object - the height belief is getting reset)

How about using the height information to update belief - we do get object height from the detector and if the object height is low we say it's low irrespective of when it's seen. Keep updating that belief every time the object is detected - do you think that would be ok or do you see any issues with that?

This is what leads to the potential of the agent looking up - because it's belief of object being above is higher than it should be.

Isn't this fine though? The agent would look up, didn't see it there, and increase the belief that the object is lowere, and eventually look down. (I believe the sensor model does account for tilt of the sensor even though the FOV is 2D -- to the extent I could implement).

Here, both the if and else are checking if current pitch is maximum (checking if pitch == 330)

That must have been a typo... I believe I meant to say min() for the elif case.

If it checks for min - which is technically 0, it is actually checking if it's level - we should probably be checking for the extremes right? which are 30 and 330?

zkytony commented 5 months ago

If the height belief says the object is same, I would assume it needs no action in terms of up or down is required.

My idea of height in the belief was a relative concept. Always relative to the current view pose. That would result in a different behavior than what you would expect. I personally find that reasonable. If I walk around to find a salt shaker with my head fixed (unless I intentially tilt it), then the target would go out of my sight as I move closer to it. Then, if I really believe that the object is in that 2D location, my relative height belief would be updated to prompt me to look down. Try to imagine this.

I did mention this in the paper, but the space was limited. "...estimate target object height among a discrete set of possible height values, Above, Below, and Same, with respect to the camera’s current tilt angle."

How about using the height information to update belief

If you could build a (vision)-based detector that can estimate detected object's height, then sure. It's not a good idea of course to take the height z value from the simulator and shove it through the observation model (I assume you are not doing this).

zkytony commented 5 months ago

If it checks for min - which is technically 0, it is actually checking if it's level

Actually, since that code is in update_target_belief_3d which is belief update, I'd assume that the angles in next_srobot are not mapped 0-360, but directly read from the simulator which can be negative. v_angles as defined here could also be negative (the min is -30). So the two are in the same space.

zkytony commented 5 months ago

I looked at the commit history, and looks like I was trying different parameters for prior_*. But yes, I think you are getting at the crux (height belief update affects lookup/down). I believe I got it to work pretty consistently eventually. It could be possible that 5b9dd741fac3fe823d8dc62d21992881cd4ba7cc was a post-hoc logic clean-up but the experiments were run using an earlier version of the code that date. I don't remember exactly. It might be worth checking out an older commit on that date (Sep 13, 2021) and see if the behavior is different.

RajeshDM commented 5 months ago

If it checks for min - which is technically 0, it is actually checking if it's level

Actually, since that code is in update_target_belief_3d which is belief update, I'd assume that the angles in next_srobot are not mapped 0-360, but directly read from the simulator which can be negative. v_angles as defined here could also be negative (the min is -30). So the two are in the same space.

Then there might be an issue here - I actually printed out the v_angles - it takes the values of 30, 0, 330, and 300. It's coming from bu_args["v_angles"]. It looks like this set of v_angles are being initialized here https://github.com/zkytony/cos-pomdp/blob/1e3c7fe940a2165b5c5c384ebf047e8139d437ec/cospomdp_apps/thor/agent/cospomdp_complete.py#L285

(the v_angles stored in task_config are 60, 30,0, -30) but after this line, they are changed to 30, 0, 330, 300. (The grid pitch function is at https://github.com/zkytony/cos-pomdp/blob/1e3c7fe940a2165b5c5c384ebf047e8139d437ec/cospomdp_apps/thor/agent/components/action.py#L114)

zkytony commented 5 months ago

Ah… one thing I forgot. Ai2thor has a coordinate system. But the pomdp agent uses a different coordinate system (based on the grid map). That’s why there is that grid_pitch conversion.

RajeshDM commented 5 months ago

Ah… one thing I forgot. Ai2thor has a coordinate system. But the pomdp agent uses a different coordinate system (based on the grid map). That’s why there is that grid_pitch conversion.

so I then I would guess that using max/min with these updated values wouldn't lead to desired results? I guess it needs to be updated to checking values in this new set of values?

zkytony commented 5 months ago

Yes, you may be correct. If so, I am more certain that that’s a post-hoc logic cleanup after experiments which made a mistake. The original issue you raised regarding ground truth agent behavior may be a result of this.

RajeshDM commented 5 months ago

So this seems to be alleviating the issue of getting a good height belief - I now use the depth projection to get a better confidence of height and it is working well - the confidence over what height an object is quite high - in the case I was trying it was below with 0.9999. The original issue still occurs though - even with this high confidence of the object being below, the agent ends up looking up in the end

image

image

As you can see, the confidence of the object being below is as close to 1 as possible. When this sequence of actions starts, it's actually looking down - then it ends up taking a couple of look actions and then calls done. Any reason you think this could be happening even with this very high confidence of height being below?

zkytony commented 5 months ago

This is indeed pretty strange. I can't think of a reason this would happen other than there's a bug somewhere. The bug is likely not in height belief update, but maybe in the sensor model, or transition model. Because LookUp when believing the object is "below" caused high value (meaning a detection) -- you can confirm this by using the TreeDebugger, and do dd.mbp if you hit a case where LookUp has the highest value.

The experiments were run on Sep 13, 2021. Looking at the commit history of the repo, I did some organization / clean-up which might have introduced a bug. I don't think this occurred (or at least would have ben fixed when I started the experiments). You might want to check out an older commit and see if the behavior persists (if you don't want to debug the current state of the code).

zkytony commented 5 months ago

Another possibility is: When the object is believed to be "below", for some reason, the agent (i.e. planner) thinks that LookUp and LookDown would both result in detection of the object. This could be due to an inaccuracy of tilting implemented in the sensor model, which would result in a fan that cover the 2D area the agent has highest belief in whether it looks up or down. This would not be a bug but a limitation of the sensor model that's implemented.

RajeshDM commented 5 months ago

Another possibility is: When the object is believed to be "below", for some reason, the agent (i.e. planner) thinks that LookUp and LookDown would both result in detection of the object. This could be due to an inaccuracy of tilting implemented in the sensor model, which would result in a fan that cover the 2D area the agent has highest belief in whether it looks up or down. This would not be a bug but a limitation of the sensor model that's implemented.

you were right. This was the issue - https://github.com/zkytony/cos-pomdp/blob/1e3c7fe940a2165b5c5c384ebf047e8139d437ec/cospomdp/models/sensors.py#L51 - the computation of the difference in pitch - this seems to be outputting the wrong value - telling the agent looking up is correct instead of look down - and the reason for incosistent behaviour is the change in arctan function was not using the right values - so sometimes even with wrong values it worked out ok so it was succeeding once in a while but most of the time it was getting the wrong fact that looking up is correct.

Here is the updated code

    rx, ry, rz = robot_pos3d
    tx, ty, tz = target_pos3d

    dist = ((tx - rx)**2 + (ty - ry)**2 ) ** 0.5
    delta_y = rz - tz
    pitch = (np.arctan(delta_y/ dist) * 360) % 360

We need to use all of x,y and z to compute the relative angle to look at.

There is another weird behaviour now - on longer tasks - it is getting stuck in an infinite loop of look up / lookdown or RotateLeft/ RotateRight. Is the something that is possible through the POMDP itself or could it be a bug?

zkytony commented 5 months ago

I didn’t deal with z of the target location in this paper. The belief is over target’s 2D location. So your fix would be doing something different and you’re on your own. The repeated lookup/lookdown behavior that you get is likely a reasonable result given the POMDP you created but the POMDP itself may not be desirable - you would have to debug the search tree. I’ve done this many times.

zkytony commented 5 months ago

I didn’t deal with z of the target location in this paper. The belief is over target’s 2D location. So your fix would be doing something different and you’re on your own. The repeated lookup/lookdown behavior that you get is likely a reasonable result given the POMDP you created but the POMDP itself may not be desirable - you would have to debug the search tree. I’ve done this many times.

Sorry I think my memory was foggy and what I said here isn’t completely correct. I can’t recall how the numerical height of the target is passed into pitch_facing. The height belief should be relative. But yes, pitch should be calculated by arctan( height / dist ) as you showed. Not exactly sure what I was doing there…

If the agent keeps looking up and down, that means it thinks that would result in target detection even though it’s far away from the target. You may want to look at why that’s happening. There should be a detection range in the sensor model that has a role to play - target outside that range should have exponentially lower probability of detection.

RajeshDM commented 5 months ago

I didn’t deal with z of the target location in this paper. The belief is over target’s 2D location. So your fix would be doing something different and you’re on your own. The repeated lookup/lookdown behavior that you get is likely a reasonable result given the POMDP you created but the POMDP itself may not be desirable - you would have to debug the search tree. I’ve done this many times.

Sorry I think my memory was foggy and what I said here isn’t completely correct. I can’t recall how the numerical height of the target is passed into pitch_facing. The height belief should be relative. But yes, pitch should be calculated by arctan( height / dist ) as you showed. Not exactly sure what I was doing there…

If the agent keeps looking up and down, that means it thinks that would result in target detection even though it’s far away from the target. You may want to look at why that’s happening. There should be a detection range in the sensor model that has a role to play - target outside that range should have exponentially lower probability of detection.

The height is computed based on the belief - if the belief about the target object is that it is high, then the height is robot_height + delta. If it's same, then it's passed robot height and if it's below, object's height is stored as robot_height - detla.

My guess with why it's repeating is because of the hierarchical nature - the higher level POMDP is providing a sub-goal but the lower level isn't able to achieve it so the higher level system is giving the same goal over and over. I will check out the tree in more detail if this is the case.

Thank you very much for your prompt responses and detailed explanations. It has been very helpful. I'm trying to build an multi-object search based on the existing code so it was very important each of the objects to be found independently or the possibility of failure would compound.

zkytony commented 5 months ago

Thanks for digging into this! Let me know if you encounter any further issue. I hope to help get this resolved.

RajeshDM commented 3 months ago

Hey I finally figured it out.

It was mostly the rotation difference issue -

All rotations represented in code (except for AI2Thor ones - we can ignore them for this purpose), are represented as angles between 0 and 360. The issue with the yaw_facing and pitch_facing functions was that they were trying to find nearest angles in this scale.

But what was happening was, the angle 350, ideally should have been classified as being closest to 0, but the diff works by taking absolute diff, which means that 330 would be classified as closest to 330, thereby not giving reward, messing up with the planning thereafter. I switched all angle differences function to account for this and now I am able to get close to 100% in single object search with perfect vision.

I made some slight modification to the the initialization of PO-UCT as well - to use the action prior when it exists. The current code was not sending the Prior even though one was defined.

zkytony commented 3 months ago

Amazing. That sounds like something I could have potentially missed. Feel free to open a PR!

I would be very eager to see a video of ground truth object search working many times in a row!

RajeshDM commented 3 months ago

I might need some time before I can open the PR. The updated code has a lot of my other changes to achieve multi-object search that are not relevant to making this work - and will mess things up for single-object search

I can definitely show you some of the videos once the multi-object search works because that is basically a similar problem in a harder setting than a single-object search anyway.

Thanks for all the help throughout the debugging process :)

zkytony commented 3 months ago

I see. I’m glad it works. Do what’s best for you.