jacobkrantz / VLN-CE

Vision-and-Language Navigation in Continuous Environments using Habitat
https://jacobkrantz.github.io/vlnce/
MIT License
273 stars 54 forks source link

questions about the VLNCE dataset #7

Open Mingxiao-Li opened 3 years ago

Mingxiao-Li commented 3 years ago

Hi, I feel quite confused about the dataset. In the vlnce gt dataset, there is locations and actions sequence, however when I use the simulator there is "vlnce-oricle action" in each step, and I found that this action sequence is different from the action sequecen in gt file。 Does someone know which one is the correct action sequence ?

jacobkrantz commented 3 years ago

Thank you for your question. Differences between the ground truth (GT) paths and paths generated from the VLNOracleActionSensor exist for two reasons (primarily the first):

  1. The GT path was constructed to travel within 0.5m of each position in the reference path (Matterport3D nav-graph nodes). The VLNOracleActionSensor ignores the reference path and provides the next action along the shortest path to the goal. This sensor was used in training as it enabled DAgger training and is a reasonable proxy for paths in the Room2Room (R2R) dataset; all R2R paths are shortest paths on the nav-graph. We have confirmed that GT paths are similar to VLNOracleActionSensor paths with approximately a 0.96 SPL.
  2. Both the GT dataset files and the VLNOracleActionSensor use the ShortestPathFollower from Habitat (the sensor uses it here). The GT path was constructed with a different version of Habitat's ShortestPathFollower than the VLNOracleActionSensor currently uses. If you want the VLNOracleActionSensor to use the same underlying oracle, one option is using Habitat-Lab v0.1.4 and a matching Habitat-Sim version. But that is a bandaid fix.

Thinking about reason 2) further, this discrepancy breaks training reproducibility of paper results since the paper used the v0.1.4 ShortestPathFollower. Also, the dataset path generation and pruning was performed using the old ShortestPathFollower. To fix this, we will bring back the old ShortestPathFollower in the VLN-CE code without downgrading the Habitat version.

Does someone know which one is the correct action sequence ?

The short answer is that the correct action sequence is the GT file.

Mingxiao-Li commented 3 years ago

ok. Thanks for answering. One more question, how do you make sure that the shortest path generated by the simulator will follow the language instruction ? It seems it is not necessary to follow the instruction, right ?

Mingxiao-Li commented 3 years ago

I also just found that there is "reference path", which is different from the location sequence in gt file and path generated by the sensor, in the dataset. I am wondering what is this "reference path“ and it is obtained ? Thanks in advanced.

Mingxiao-Li commented 3 years ago

Thank you for your question. Differences between the ground truth (GT) paths and paths generated from the VLNOracleActionSensor exist for two reasons (primarily the first):

  1. The GT path was constructed to travel within 0.5m of each position in the reference path (Matterport3D nav-graph nodes). The VLNOracleActionSensor ignores the reference path and provides the next action along the shortest path to the goal. This sensor was used in training as it enabled DAgger training and is a reasonable proxy for paths in the Room2Room (R2R) dataset; all R2R paths are shortest paths on the nav-graph. We have confirmed that GT paths are similar to VLNOracleActionSensor paths with approximately a 0.96 SPL.
  2. Both the GT dataset files and the VLNOracleActionSensor use the ShortestPathFollower from Habitat (the sensor uses it here). The GT path was constructed with a different version of Habitat's ShortestPathFollower than the VLNOracleActionSensor currently uses. If you want the VLNOracleActionSensor to use the same underlying oracle, one option is using Habitat-Lab v0.1.4 and a matching Habitat-Sim version. But that is a bandaid fix.

Thinking about reason 2) further, this discrepancy breaks training reproducibility of paper results since the paper used the v0.1.4 ShortestPathFollower. Also, the dataset path generation and pruning was performed using the old ShortestPathFollower. To fix this, we will bring back the old ShortestPathFollower in the VLN-CE code without downgrading the Habitat version.

Does someone know which one is the correct action sequence ?

The short answer is that the correct action sequence is the GT file.

ok. Thanks for answering. One more question, how do you make sure that the shortest path generated by the simulator will follow the language instruction ? It seems it is not necessary to follow the instruction, right ? I also just found that there is "reference path", which is different from the location sequence in gt file and path generated by the sensor, in the dataset. I am wondering what is this "reference path“ and it is obtained ? Thanks in advanced

jacobkrantz commented 3 years ago

The original VLN task is based on navigating from node to node on the Matterport3D graph. When we ported each trajectory to a continuous environment, we transferred not just the start and goal locations, but the intermediate node locations as well. The reference_path is this sequence of node locations from start to goal (including start and goal). The section "Transferring Nav-Graph Trajectories" in the paper has details of the process.

We ensure that the GT path matches reasonably with the language instruction by having the GT path travel within 0.5m of each position in the reference_path. The VLNOracleActionSensor does not guarantee alignment with the instruction -- there are occasional instances where the path generated by the VLNOracleActionSensor takes a different route, but this is empirically uncommon. Since all paths in the R2R dataset are shortest paths, this only happens when there is another approximate shortest path that happens to be shorter in continuous environments.

Mingxiao-Li commented 3 years ago

Many thanks for your answering. You mentioned that if I want to get the sensor that can generate same path as the path in gt file, I should use habitat-sim v0.1.4 and matching Habitat-sim. May I ask which version of Habitat-sim should I use, or , since you are also considering to bring the old follwer back, when will you get this done ?

jacobkrantz commented 3 years ago

The old follower is now back 🙂 (#8). If you want it to follow the reference path, you can do something like this: https://gist.github.com/jacobkrantz/9143f4744b9e1809bb1ef0522700932b

Mingxiao-Li commented 3 years ago

Than you very much.

Mingxiao-Li commented 3 years ago

Hi, Have you cheked if the original follower can still genereate same action sequence ? I tried to use the origin follower but I still generate a difference action sequences.

Mingxiao-Li commented 3 years ago

Another question is that I used the action seqeunce in gt file to generate a location sequence, but then I found this location sequence is different from the location sequence in ft file.

Mingxiao-Li commented 3 years ago

@jacobkrantz , hI, i am sorry for bothering you agian. I found that I still get a differernt path when I used the old follower. Could you kindly check if the old follower could still generate the same path ?

jacobkrantz commented 3 years ago

Using Habitat v0.1.5 and ShortestPathFollowerCompat, I find that the GT action sequences match exactly for 95% of paths. The 5% of differences can be attributed to internal version differences in the simulator (same action results in slightly different location). If you want a perfect replication of the GT paths, you need to use the same versions of Habitat-Lab and Habitat-Sim used to generate them. I created a new branch that contains code compatible with these older versions as well as instructions for installation in the README: https://github.com/jacobkrantz/VLN-CE/tree/orig_habitat_versions. Using that Habitat version, I have confirmed for each episode that the generated action and location sequences match the GT paths. Where exact matching is not required, I recommend using the VLN-CE master branch.

Mingxiao-Li commented 3 years ago

Using Habitat v0.1.5 and ShortestPathFollowerCompat, I find that the GT action sequences match exactly for 95% of paths. The 5% of differences can be attributed to internal version differences in the simulator (same action results in slightly different location). If you want a perfect replication of the GT paths, you need to use the same versions of Habitat-Lab and Habitat-Sim used to generate them. I created a new branch that contains code compatible with these older versions as well as instructions for installation in the README: https://github.com/jacobkrantz/VLN-CE/tree/orig_habitat_versions. Using that Habitat version, I have confirmed for each episode that the generated action and location sequences match the GT paths. Where exact matching is not required, I recommend using the VLN-CE master branch.

OK. I will try that. Many thanks for the explanation.