real-stanford / diffusion_policy

[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
https://diffusion-policy.cs.columbia.edu/
MIT License
1.1k stars 206 forks source link

About Dataset Padding #30

Closed yuanfei-Wang closed 6 months ago

yuanfei-Wang commented 7 months ago

Thanks for providing this beautiful code and documentation! I have been reading the implementation of Dataset and SequenceSampler in the Colab example and I have a question about it.

def create_sample_indices(
        episode_ends:np.ndarray, sequence_length:int,
        pad_before: int=0, pad_after: int=0):
    indices = list()
    for i in range(len(episode_ends)):
        start_idx = 0
        if i > 0:
            start_idx = episode_ends[i-1]
        end_idx = episode_ends[i]
        episode_length = end_idx - start_idx

        min_start = -pad_before
        max_start = episode_length - sequence_length + pad_after

        # range stops one idx before end
        for idx in range(min_start, max_start+1):
            buffer_start_idx = max(idx, 0) + start_idx
            buffer_end_idx = min(idx+sequence_length, episode_length) + start_idx
            start_offset = buffer_start_idx - (idx+start_idx)
            end_offset = (idx+sequence_length+start_idx) - buffer_end_idx
            sample_start_idx = 0 + start_offset
            sample_end_idx = sequence_length - end_offset
            indices.append([
                buffer_start_idx, buffer_end_idx,
                sample_start_idx, sample_end_idx])
    indices = np.array(indices)
    return indices

I understand that we need to pad $T_o-1$ (pad_before) steps before an episode because we need to make sure the first action is predicted conditioning on observation of length $T_o$. But why do we need to pad after an episode for $T_a-1$ (pad_after) steps? What is the problem if we make the max_start to be episode_length-sequence_length?

cheng-chi commented 7 months ago

Hi @yuanfei-Wang, in several of our simulation datasets, it is desireable if the robot stays static once the task is finished (in particular the Push-T task which terminates immeidately after the T is inside the goal region). If set we max_start to episode_length-sequence_length or equiveltenly set pad_after=0, the last few observations of the dataset will never get supervised on, which will lead to performance degredation on certain benchmarks. In particulary for the Push-T task, if we set pad_after=0, the policy sometimes will suddenly make an undesired move right before completing the task, since the observations right before completing the task are out of distribution.

yuanfei-Wang commented 7 months ago

Thanks for your insightful explanation! Now I only have a minor question. Why is pad_after set to be $T_a-1$ instead of other values? Is there any specific reason or did you try other options?

cheng-chi commented 7 months ago

Hi @yuanfei-Wang, we would like to have the policy to observe all images in the training dataset. In order to do this, need to set pad_after to T_a - 1 (since the first action step is aligned with the last observation step). In most of our simulation benchmarks, it is desired for the robot to stay still after finishing the task, which is what actions will be set inside paddings ("same" padding for positional actions). If this behavior is not desired, you should set pad_after to 0.

yuanfei-Wang commented 7 months ago

Hi @cheng-chi, now I understand why pad_after is set to $T_a-1$. Thanks for your detailed explanation!

ManUtdMoon commented 6 months ago

Hi @cheng-chi, I think if we set pad_after = Ta - 1, the last few observations are still not supervised (specifically, last horizon - To - Ta + 1 obs) because only the first To obses are used for conditioning. Why not set pad_after to horizon - To (then the last sample conditions on obs[-To:], i.e., all obses are leveraged)?

tongzhoumu commented 5 months ago

Hi @cheng-chi, I think if we set pad_after = Ta - 1, the last few observations are still not supervised (specifically, last horizon - To - Ta + 1 obs) because only the first To obses are used for conditioning. Why not set pad_after to horizon - To (then the last sample conditions on obs[-To:], i.e., all obses are leveraged)?

I have the same question.