When the env._partially_observable is to False ( line 181 and line 271 in rollout_runner.py), the goal position is included in the last 3 elements of observation, which is fed directly into the model in both training and testing. So if I didn't miss anything, labels are leaked.
I run a quick experiment on reach-v2 environment and find that by masking the goal position from the observation, the success rate drops. Here is what I did:
Train with original code bash experiments/scripts/metaworld/train_test_metaworld_1task.sh hf://liruiw/hpt-base "" "" train.total_epochs=50
Test: 19.4% average success rate across 5 runs, which is low but expected since only 50 epochs are trained.
Remove data folder, set _partially_observable in RolloutRunner to True, set the last three elements in state to 0 after line 325
Test the previous model, a 0% success rate is returned.
Train again on the corrected dataset.
Test again: 14.6% average success rate across 5 runs
Could you please check if I miss anything and verify if this affects the results reported in the paper? Also, could you please release the complete code for reproducing the results in your paper? Thank you.
Thanks for the questions. This is not a mistake. We usd the goal information for proprioception information for metaworld, and this is consistent with the results in this paper.
When the
env._partially_observable
is toFalse
( line 181 and line 271 inrollout_runner.py
), the goal position is included in the last 3 elements of observation, which is fed directly into the model in both training and testing. So if I didn't miss anything, labels are leaked.I run a quick experiment on reach-v2 environment and find that by masking the goal position from the observation, the success rate drops. Here is what I did:
bash experiments/scripts/metaworld/train_test_metaworld_1task.sh hf://liruiw/hpt-base "" "" train.total_epochs=50
data
folder, set_partially_observable
in RolloutRunner toTrue
, set the last three elements in state to 0 after line 325Could you please check if I miss anything and verify if this affects the results reported in the paper? Also, could you please release the complete code for reproducing the results in your paper? Thank you.