Closed yuanfei-Wang closed 6 months ago
Hi @yuanfei-Wang, in several of our simulation datasets, it is desireable if the robot stays static once the task is finished (in particular the Push-T task which terminates immeidately after the T is inside the goal region). If set we max_start
to episode_length-sequence_length
or equiveltenly set pad_after=0
, the last few observations of the dataset will never get supervised on, which will lead to performance degredation on certain benchmarks. In particulary for the Push-T task, if we set pad_after=0
, the policy sometimes will suddenly make an undesired move right before completing the task, since the observations right before completing the task are out of distribution.
Thanks for your insightful explanation! Now I only have a minor question. Why is pad_after
set to be $T_a-1$ instead of other values? Is there any specific reason or did you try other options?
Hi @yuanfei-Wang, we would like to have the policy to observe all images in the training dataset. In order to do this, need to set pad_after
to T_a - 1 (since the first action step is aligned with the last observation step). In most of our simulation benchmarks, it is desired for the robot to stay still after finishing the task, which is what actions will be set inside paddings ("same" padding for positional actions). If this behavior is not desired, you should set pad_after
to 0.
Hi @cheng-chi, now I understand why pad_after is set to $T_a-1$. Thanks for your detailed explanation!
Hi @cheng-chi, I think if we set pad_after = Ta - 1
, the last few observations are still not supervised (specifically, last horizon - To - Ta + 1
obs) because only the first To
obses are used for conditioning. Why not set pad_after
to horizon - To
(then the last sample conditions on obs[-To:]
, i.e., all obses are leveraged)?
Hi @cheng-chi, I think if we set
pad_after = Ta - 1
, the last few observations are still not supervised (specifically, lasthorizon - To - Ta + 1
obs) because only the firstTo
obses are used for conditioning. Why not setpad_after
tohorizon - To
(then the last sample conditions onobs[-To:]
, i.e., all obses are leveraged)?
I have the same question.
Thanks for providing this beautiful code and documentation! I have been reading the implementation of Dataset and SequenceSampler in the Colab example and I have a question about it.
I understand that we need to pad $T_o-1$ (pad_before) steps before an episode because we need to make sure the first action is predicted conditioning on observation of length $T_o$. But why do we need to pad after an episode for $T_a-1$ (pad_after) steps? What is the problem if we make the
max_start
to beepisode_length-sequence_length
?