Closed anilbatra2185 closed 2 years ago
Sorry, I never tried this setting and this code does not support base Transformer. In my early experiments, I tried the original DETR on ActivitNet Captions but found that the predicted captions are almost the same. Then I move to the Deformable DETR which has a prior to constrain the distribution of attention weights.
thanks @ttengwang for confirming. I am trying to train the setting, the model is not getting trained. So, just wondering if there is any important trick to train the simple DETR style model. Appreciate any thoughts or suggestions.
Thanks
Hi @ttengwang
Appreciate you for sharing the code.
I am wondering if you train the base Transformer +LSTM on Youcook2 dataset, i.e. similar to Row 1 and 2 in Table 7 (a).
I am wondering if the current code supports to train the base transformer or not.
Thanks