Open joeyz0z opened 1 year ago
The intermediate loss splits the learning into multiple steps and may ease the learning process. I observed it improves both localization and captioning performance, but I didn't remember it helps convergence.
The design follows the DETR and Deformable-DETR and you may find more analysis in these papers.
Hello, I was wondering about the role of auxiliary losses on each intermediate decoder layer. Can it help to accelerate the model convergence or for other purposes? Thanks!