I encountered the following problem during the reproduction of your paper and was wondering if you might be willing to offer some guidance or clarification.
I trained the 2M VIMAPolicy (with all weights initialized by their default initial distributions except T5) on a small subset of the VIMA-Bench dataset (32 samples per task and 13 tasks in total) and tried to make it overfit.
It can be found that the imitation loss (calculated by cross_entropy_loss(dist_dict._logits, discrete_target_action)) of different action attributes (such as pose0_rotation, pose1_position) can change very differently during the training process, like the plot showing below. In this experiment, the final loss is calculated by taking the sum of all those action attributes with equal weights and then normalized by time step length
The plot shows how different loss (per step) attributes converges. for example, `pose0_rotation_0` means the loss associated with the first dimension of `pose0_rotation` at a single time step.
By zooming to the first and last 100 epochs of the experiment, it can be found all dimensions of pose0_rotation and the first two dimensions of pose1_rotation converge very quickly to zero while the other losses converge relatively slow. The scaling between them changes dynamically.
First100 epochs
Last 100 epochs
In the same experiment, I also measured the ratio of the average loss between different tasks and got the following table. For example, 16.745474 means that the average loss of rearrange_then_restore samples is about 16x larger than the one of novel_noun samples
I've encountered similar questions. I released everything I did here: https://github.com/amitkparekh/CoGeLoT, maybe it has some answers to your questions?
Thank you for sharing such great works!
I encountered the following problem during the reproduction of your paper and was wondering if you might be willing to offer some guidance or clarification.
I trained the 2M
VIMAPolicy
(with all weights initialized by their default initial distributions except T5) on a small subset of the VIMA-Bench dataset (32 samples per task and 13 tasks in total) and tried to make it overfit.It can be found that the imitation loss (calculated by
cross_entropy_loss(dist_dict._logits, discrete_target_action)
) of different action attributes (such aspose0_rotation
,pose1_position
) can change very differently during the training process, like the plot showing below. In this experiment, the final loss is calculated by taking the sum of all those action attributes with equal weights and then normalized by time step lengthThe plot shows how different loss (per step) attributes converges.
for example, `pose0_rotation_0` means the loss associated with the first dimension of `pose0_rotation` at a single time step.
By zooming to the first and last 100 epochs of the experiment, it can be found all dimensions of
pose0_rotation
and the first two dimensions ofpose1_rotation
converge very quickly to zero while the other losses converge relatively slow. The scaling between them changes dynamically.First100 epochs
Last 100 epochs
In the same experiment, I also measured the ratio of the average loss between different tasks and got the following table. For example,
16.745474
means that the average loss ofrearrange_then_restore
samples is about 16x larger than the one ofnovel_noun
samplesI would like to know how those losses (per action attribute and per task) are balanced during training. Thank you