hyunwoongko / transformer

Transformer: PyTorch Implementation of "Attention Is All You Need"
2.78k stars 413 forks source link

Questions regarding the implementation #3

Open yuvaraj91 opened 2 years ago

yuvaraj91 commented 2 years ago
  1. In this file https://github.com/hyunwoongko/transformer/blob/master/models/model/transformer.py, you define the functions make_pad_mask and make_no_peak_mask, but it is actually used during training?
  2. In this file, https://github.com/hyunwoongko/transformer/blob/master/models/layers/position_wise_feed_forward.py why does your PositionwiseFeedForward have extra layers?
hyunwoongko commented 2 years ago
  1. yes. see https://github.com/hyunwoongko/transformer/blob/master/models/model/transformer.py#L40.
  2. what is extra layers?
yuvaraj91 commented 2 years ago

Thank you for your reply :)

For (2), yes you are right actually. I made a mistake when comparing the class PositionwiseFeedForwardwith these two implentations; in https://github.com/bentrevett/pytorch-seq2seq/blob/master/6%20-%20Attention%20is%20All%20You%20Need.ipynb and http://nlp.seas.harvard.edu/2018/04/03/attention.html. But now I see that yours is the same, but you just coded it in a different format.

I have another question, how could we visualise the attention heatmap at the decoder heads, similar to https://github.com/bentrevett/pytorch-seq2seq/blob/master/6%20-%20Attention%20is%20All%20You%20Need.ipynb?

yuvaraj91 commented 2 years ago

And here. https://github.com/hyunwoongko/transformer/blob/1d2e33f675232956ef4bc3fbb1c3de2300a1f0a7/models/model/transformer.py#L45, here you use "*" symbol to mean a multiplication? In the other implementations, a bitwise "&" operator was used. I am just wondering what is the difference here. Thanks!

hyunwoongko commented 2 years ago

@GJ98 could you explain about this? (note this mask implementation was not written by me)

hyunwoongko commented 2 years ago

I have another question, how could we visualise the attention heatmap at the decoder heads, similar to

I was planning to implement it, but I didn't do it because I didn't have enough time. I welcome PR!

GJ98 commented 2 years ago

@Yuvaraj91 There is no difference between "*" and "&". I think "&" can be more clear than "*".

yuvaraj91 commented 2 years ago

Ok thank you both @GJ98 @hyunwoongko !

Another question, where do you get the value of 256 from? https://github.com/hyunwoongko/transformer/blob/1d2e33f675232956ef4bc3fbb1c3de2300a1f0a7/conf.py#L13

hyunwoongko commented 2 years ago

What do you mean?

Qing-zhan commented 1 year ago

Hi, could you share all the requirements of this repo, like pytorch version etc. Thanks.

linghushaoxia commented 1 month ago

Hi, could you share all the requirements of this repo, like pytorch version etc.
requirements.txt @Qing-zhan the requirements is from pip freeze