jadore801120 attention-is-all-you-need-pytorch issues

jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".

MIT License

8.78k stars 1.97k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

How to export the trained chkpt network to onnx?

#172 ZhangDongyuCN closed 3 years ago
5
train big data(8G)

#171 JoeCoding opened 3 years ago
0
Bump tensorflow from 1.14.0 to 2.4.0

#170 dependabot[bot] closed 3 years ago
1
Can't find model 'en'

#169 manhph2211 opened 3 years ago
2
Fix Two Potential Bugs, with Significant Accuracy Improvement

#168 huanghoujing closed 3 years ago
2
Question About Attention Score Computation Process & Intuition

#167 rezhv opened 3 years ago
0
why none pad mask is nedd

#166 helloworld729 opened 3 years ago
1
what is meaning of trg_pad_idx in label smoothing loss?

#165 fakerhbj opened 3 years ago
0
wrong with the code!!!!!

#164 chenrxi closed 3 years ago
0
SyntaxError: invalid syntax

#163 junzew closed 3 years ago
1
what does n_head, d_model, d_k, d_v stands for?

#162 seyeeet closed 3 years ago
1
Update your codes

#161 thechvarun closed 3 years ago
1
Why decoding is needed during inference ?

#160 rajeevbaalwan opened 4 years ago
0
How does the gradients flow in cal_loss function in train.py?

#159 InhyeokYoo closed 4 years ago
0
Resuming Training

#158 kaiyon07 opened 4 years ago
5
Why the previous version train faster

#157 dwtenis closed 4 years ago
1
raise ConnectionError(e, request=request)

#156 KrisLee512 opened 4 years ago
1
To make position embedding be implemented by PyTorch, and to support …

#155 zipzou closed 1 week ago
0
Surprising PPL on WMT 17

#154 luffycodes opened 4 years ago
0
d_k not equal to d_k gives issues

#153 luffycodes closed 4 years ago
0
PPL on wmt - 17

#152 luffycodes opened 4 years ago
0
It seems that the layer norm and pos ffn are not consistent with the paper？

#151 zwlanpishu closed 4 years ago
1
Fix LayerNorm.

#150 tony2037 closed 4 years ago
3
masking is not complete

#149 JianBingJuanDaCong opened 4 years ago
1
the src_mask.

#148 chenjun2hao closed 4 years ago
1
Performance with default parameters looks completely off...

#147 JianBingJuanDaCong opened 4 years ago
1
fix masking tensor

#146 MokkeMeguru closed 4 years ago
1
Training on Custom Data

#145 kevaday closed 4 years ago
1
n_position in positional encoding

#144 Tejaswini2612 opened 4 years ago
1
Now the model depends on specific preprocessing method too much

#143 ylmeng opened 4 years ago
1
About Layernorm

#142 BUCTwangkun closed 4 years ago
2
slow and inaccurate

#141 xiaoshingshing opened 4 years ago
2
TypeError: tuple indices must be integers or slices, not tuple when translating

#140 liperrino opened 4 years ago
1
Preprocess error

#139 ZhichaoOuyang opened 4 years ago
6
shared embedding factor bug

#138 kaituoxu closed 4 years ago
2
why use matmul to instead of bmm?

#137 kaituoxu closed 4 years ago
2
WMT14 en-de

#136 zhao1402072392 opened 4 years ago
2
preprocess ERROR

#135 JingsenZhang opened 4 years ago
4
About Position Embedding and mask

#134 Zessay closed 4 years ago
4
Why bias=False in q, k, and v projection

#133 mertensu opened 4 years ago
3
What I get from the default is very different from what you showed. Is it because of the code update?

#132 SmallSmallQiu closed 4 years ago
6
Expected object of scalar type Bool but got scalar type Byte for argument #2 'other'

#131 SmallSmallQiu closed 4 years ago
4
add performance result on IWSLT14 de-en dataset

#130 marvinzh opened 4 years ago
0
update

#129 shaoxiaoyu opened 4 years ago
0
AttributeError: 'Decoder' object has no attribute 'tgt_word_emb'

#128 HassanNaeemjutt closed 4 years ago
1
new

#127 flwjt closed 5 years ago
0
Error when training with -no_cuda

#126 wz337 closed 4 years ago
1
Question about get_the_best_score_and_idx in Beam.py

#125 yudmoe closed 4 years ago
1
RuntimeError: DataLoader worker (pid 26604) is killed by signal: Killed.

#124 Ike-yang closed 4 years ago
2
Where is the input to the decoder during training shifted by one?

#123 jonathanking closed 5 years ago
1

Previous Next