Open yuvaraj91 opened 2 years ago
extra layers
?Thank you for your reply :)
For (2), yes you are right actually. I made a mistake when comparing the class PositionwiseFeedForward
with these two implentations; in https://github.com/bentrevett/pytorch-seq2seq/blob/master/6%20-%20Attention%20is%20All%20You%20Need.ipynb and http://nlp.seas.harvard.edu/2018/04/03/attention.html.
But now I see that yours is the same, but you just coded it in a different format.
I have another question, how could we visualise the attention heatmap at the decoder heads, similar to https://github.com/bentrevett/pytorch-seq2seq/blob/master/6%20-%20Attention%20is%20All%20You%20Need.ipynb?
And here. https://github.com/hyunwoongko/transformer/blob/1d2e33f675232956ef4bc3fbb1c3de2300a1f0a7/models/model/transformer.py#L45, here you use "*" symbol to mean a multiplication? In the other implementations, a bitwise "&" operator was used. I am just wondering what is the difference here. Thanks!
@GJ98 could you explain about this? (note this mask implementation was not written by me)
I have another question, how could we visualise the attention heatmap at the decoder heads, similar to
I was planning to implement it, but I didn't do it because I didn't have enough time. I welcome PR!
@Yuvaraj91 There is no difference between "*" and "&". I think "&" can be more clear than "*".
Ok thank you both @GJ98 @hyunwoongko !
Another question, where do you get the value of 256 from? https://github.com/hyunwoongko/transformer/blob/1d2e33f675232956ef4bc3fbb1c3de2300a1f0a7/conf.py#L13
What do you mean?
Hi, could you share all the requirements of this repo, like pytorch version etc. Thanks.
Hi, could you share all the requirements of this repo, like pytorch version etc.
requirements.txt @Qing-zhan the requirements is from pip freeze
make_pad_mask
andmake_no_peak_mask
, but it is actually used during training?PositionwiseFeedForward
have extra layers?