liruiw / HPT

Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.
https://liruiw.github.io/hpt
MIT License
378 stars 18 forks source link

Goal Position Leakage in Metaworld Benchmark #6

Closed AlbertYang0112 closed 1 week ago

AlbertYang0112 commented 2 weeks ago

When the env._partially_observable is to False ( line 181 and line 271 in rollout_runner.py), the goal position is included in the last 3 elements of observation, which is fed directly into the model in both training and testing. So if I didn't miss anything, labels are leaked.

I run a quick experiment on reach-v2 environment and find that by masking the goal position from the observation, the success rate drops. Here is what I did:

Could you please check if I miss anything and verify if this affects the results reported in the paper? Also, could you please release the complete code for reproducing the results in your paper? Thank you.

liruiw commented 2 weeks ago

Thanks for the questions. This is not a mistake. We usd the goal information for proprioception information for metaworld, and this is consistent with the results in this paper.