Open chungfu27 opened 7 years ago
Thanks for your attention model again. I agree with @chungfu27. The paper described the model like this, "word-by-word attention based on all output vectors of the hypothesis (h7, h8 and h9)", but in the codes, I think you get the attention only based on the last output vector.
Hi @shyamupa , Thanks for your attention model!! I can get the alpha value to visualize the machine attention level for my task. But I found a strange phenomenon about alpha value. The following picture is the heatmap output of "flat_alpha" layer: It looks well!!! But I exported the output of "alpha" layer (through softmax), I got this follwoing result: I know softmax will sharpen and normalize the result, but I also used flat_alpha data to do softmax function in my local and the following result is different from the output of "alpha" layer: The heatmap shape is (20, 200), there are 20 sentences and every sentence length is 200. Do you have any suggestion for this?