vincentcartillier / Semantic-MapNet

72 stars 11 forks source link

How to identify the floor level of a navigable point in specific scan? #20

Open wbhwbh opened 8 months ago

wbhwbh commented 8 months ago

I admire your outstanding work in this project. Inspired by this project, I now want to determine the floor level of a given navigable point with known 3d coordinate in the specific scan. I have some questions about it.

I find that in file utils/habitat_utils.py and function sample_navigable_point() you have below logic to get a point on the right floor.

for in range(1000):
    point = self.sim.sample_navigable_point()
    if np.abs(self.start_height - point[1]) <= 1.5:
        return point

The logic seems to be correct as the distance between this point and the floor of certain level is less than the height of the agent (1.5). However, I discover there exists two floors of the same scene with similar start heights. For example, the scan SN83YJsR3w2 have floor 0 and floor 1, and their start heights are -3.64484 and -3.61878 which I get from the houses_dim.json file.

Why does two floors have such similar start heights? Does your logic that find a point on the right floor still work? How can I determine the floor level of a given navigable point with known 3d coordinate in world.

Thank you in advance.

vincentcartillier commented 8 months ago

I believe this is data annotation noise. I am checking the scans using the online viewer.

level-0 image

level-1 image

If you look at the level-1, there is this room at the bottom right that seem to be on the first floor but is labeled as level-1. This would cause the start_height to be similar for both level-0 and -1.

Now for that specific example this could be something where you can fix the annotation directly by re-evaluating the level-1 z-boundary manually.

Note that finding the level height automatically is pretty hard in general. Houses may have very different shapes and layouts - eg split-level homes.

wbhwbh commented 8 months ago

I am sorry to bother you again. The previous doubts have almost been resolved, but new questions have arisen.

When you apply your Semantic Map Construction technology into the Object Navigation problem, you consider a pre-exploration setting where the agent first traverses the environment.

How do you convert semantic maps when navigation involves crossing floors? As it seems that only one semantic map is maintained consistently, do you always search for target objects on the same floor? And is navigation episode considered a failure when no target object appears on the same floor?

Thank you in advance, similarly.

vincentcartillier commented 7 months ago

We do limit the search and navigation for objects on the same floor as the agent. So we don't really use/combine semantic maps from different floors. We do not really consider cases where no target object is present on the floor. And actually I'm not sure if such episode exists in the dataset. If it's the case we just discarded the episode.

One last thing, if you are interested in pre-exploration settings for Embodied AI tasks, you should also check our Episodic Memory QA paper: https://arxiv.org/abs/2205.01652