Hi, thank you for sharing your inspiring work. I am wondering what is the primary policy difference between train_diffusion_unet_timm_umi_workspace and train_diffusion_unet_image_workspace? In my understanding, they both condition on visual and proprioception observations to predict robot actions. Aside from variations in training hyperparameters, are there any specific design features intended for the UMI task?
Hi, thank you for sharing your inspiring work. I am wondering what is the primary policy difference between
train_diffusion_unet_timm_umi_workspace
andtrain_diffusion_unet_image_workspace
? In my understanding, they both condition on visual and proprioception observations to predict robot actions. Aside from variations in training hyperparameters, are there any specific design features intended for the UMI task?