Open ChicyChen opened 1 year ago
Hi, Chicy, For the fact that MLP heads can increase the retrieval accuracies, we think the reason is that:
When doing NN video retrieval, we directly use the last conv-layer feature, and aggregate it to a feature vector with global-average pooling.
Thank you! Also I am curious, have tried to train on the full K400 dataset, would that help or harm the model?
On Sep 13, 2023, at 4:35 AM, Haodong Duan @.***> wrote:
Hi, Chicy, For the fact that MLP heads can increase the retrieval accuracies, we think the reason is that:
If you only adopt a linear head to predict the target of the proxy task (like the temporal transformation applied to the clip), your final conv-layer feature will be highly correlated with the task, which has significant domain gap with semantics tasks. Using an MLP layer instead can undermine such correlation, thus the final conv-layer feature can work better on semantic downstream tasks. When doing NN video retrieval, we directly use the last conv-layer feature, and aggregate it to a feature vector with global-average pooling.
— Reply to this email directly, view it on GitHub https://github.com/kennymckormick/TransRank/issues/4#issuecomment-1717188920, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALT5QEHGHRASIVWMFE34AZTX2FV4JANCNFSM6AAAAAA4V6PDDI. You are receiving this because you authored the thread.
Hi, in your paper table 6, why would having an MLP head instead of a linear head improve NN video retrieval accuracies? While doing NN video retrieval, do you train an additional linear head/MLP? Thank you!