Official code and data for EMNLP 2020 paper "Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings"
In your paper, the input of the word distribution over vocabulary is a context-rich representation c_t=[u_t; s_t; c_text + c_fuse]. But in your code, it seems that you only concate s_t and c_text+c_fuse. Is it a clerical error in paper or something I didn't notice?
In your paper, the input of the word distribution over vocabulary is a context-rich representation
c_t=[u_t; s_t; c_text + c_fuse]
. But in your code, it seems that you only concates_t
andc_text+c_fuse
. Is it a clerical error in paper or something I didn't notice?