simon-ging / coot-videotext

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Apache License 2.0
288 stars 55 forks source link

Can I add your loss function to my non-Transformer model? #56

Open Wangdanchunbufuz opened 7 months ago

Wangdanchunbufuz commented 7 months ago

Can I add your loss function to my non-Transformer model? such as <Event-Centric Hierarchical Representation for Dense Video Captioning>

Wangdanchunbufuz commented 7 months ago

Thank you very much for your excellent paper. Now I plan to implement a dense caption model on my own data set. My benchmark model is non-Transformer. Can I use the Cross-Modal Cycle Consistency loss which you proposed?

simon-ging commented 7 months ago

Hi, thanks for your interest. The loss is independent of the architecture, so if your data and model output fits then for sure you can use the Cross-Modal Cycle Consistency loss.