Goal Position Leakage in Metaworld Benchmark

When the env._partially_observable is to False ( line 181 and line 271 in rollout_runner.py), the goal position is included in the last 3 elements of observation, which is fed directly into the model in both training and testing. So if I didn't miss anything, labels are leaked.

I run a quick experiment on reach-v2 environment and find that by masking the goal position from the observation, the success rate drops. Here is what I did:

Train with original code bash experiments/scripts/metaworld/train_test_metaworld_1task.sh hf://liruiw/hpt-base "" "" train.total_epochs=50
Test: 19.4% average success rate across 5 runs, which is low but expected since only 50 epochs are trained.
Remove data folder, set _partially_observable in RolloutRunner to True, set the last three elements in state to 0 after line 325
Test the previous model, a 0% success rate is returned.
Train again on the corrected dataset.
Test again: 14.6% average success rate across 5 runs

Could you please check if I miss anything and verify if this affects the results reported in the paper? Also, could you please release the complete code for reproducing the results in your paper? Thank you.

liruiw / HPT

Goal Position Leakage in Metaworld Benchmark #6