qinzzz / Multimodal-Alignment-Framework

Implementation for MAF: Multimodal Alignment Framework
43 stars 6 forks source link

Questions about the implementation #5

Closed gorjanradevski closed 3 years ago

gorjanradevski commented 3 years ago

Hi! I'm reimplementing your EMNLP paper from scratch (to integrate it in my own codebase), while also following your code. There is one part of the code which isn't very clear to me, and also isn't mentioned in your paper (Line 82, model.py):

p_emb = self.linear_p(p_emb) + eps * self.linear_mini(p_emb)

Can you please let me know what is the goal of having this, and how not having it would affect the final result? Using your data (image region features, bounding box predictions, etc.), I was able to get up-to 51.38% accuracy on the Flickr30k validation set (using only region features, no label or attributes), but can't go further than that. Also, would you mind including some training/validation curves for easier debugging (if it's not too much to ask, of course).

Thanks for the great paper and repo!

gorjanradevski commented 3 years ago

Ok, I managed to fully reproduce the paper and noticed that not having self.linear_mini doesn't affect the performance. I'm closing this issue.