Closed mateuszwyszynski closed 2 months ago
I believe the problem is caused by the lack of normalization, i.e. we have to use NormalizeProprio
after registering environment with gym
:
env = gym.make(task_name)
from octo.utils.gym_wrappers import HistoryWrapper, NormalizeProprio, RHCWrapper
env = NormalizeProprio(env, model.dataset_statistics)
env = HistoryWrapper(env, horizon=1)
env = RHCWrapper(env, exec_horizon=50)
obs, info = env.reset(options={"variation": 0})
obs['proprio'][0]
After adding the line with normalization the numbers we get for the initial joint positions are approximately the same.
I believe small differences (of the order $10^{-3}$) are acceptable, because such differences are present between different episodes themselves. More precisely, if we run env.reset
multiple times we will get slightly different values for the starting position. Hence my belief is that they are simply caused by some kind of randomization or numerical precision.
For the ALOHA dataset, one can generate the first proprio state using code from finetuning script:
and the result will be the same as when one uses the code from evaluation script:
When we do the same for the RLBench scripts, i.e. first:
and then:
we get different results.
So it seems that we have a mismatch between data used for training and the one used for evaluation.