Open andrearosasco opened 6 months ago
It's not the same image: the observation is the current time step's image, while the "task" is the goal image, ie the image from a randomly sampled future time step.
Hi, I notice that there is image augmentation during training, will the goal image be also augmented like observation? if not, they are not matched spatially.
As you can see in the picture posted here #42 the field
task_stack_keys
for the observation tokenizers appears to be the same asobs_stack_keys
. This result in the model stacking the image onto itself before processing it. Why is this happening?