Open NileZhou opened 5 years ago
According to the paper "Get-to-the-Point", eq10, coverage vector is the sum of previous encoder attentions.
According to the paper "Get-to-the-Point", eq10, coverage vector is the sum of previous encoder attentions.
Thanks, you're right. I want to ask more about the mechanism. When the amount of data is very small (I have 5000 pairs data which consists of content and headline), the model in this repository still likes to repeat itself. My paremeters settings (in params.py) about the coverage mechanism: enc_attn_cover = True cover_func= 'max' cover_loss: float = 1 show_cover_loss = False
I would appreciate it if you could help me !
Sorry, I have tried some experiments using this repo but I can not reproduce the result on CNNDM, So, maybe there are still some errors in the repo. Additionally, Pointer-generator uses "sum" as cover_func.
Sorry, I have tried some experiments using this repo but I can not reproduce the result on CNNDM, So, maybe there are still some errors in the repo. Additionally, Pointer-generator uses "sum" as cover_func.
Thanks for your help.
The author of pointer-generator propose a method called coverage mechanism. Coverage vector, which is the sum of attention distributions over all previous decoder timesteps, but the coverage vector in this repository seems to sum the attention of encoder! Please help me find the correct method to implementation the mechanism or tell me where is my fault.