lsdefine / attention-is-all-you-need-keras

A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need
702 stars 188 forks source link

maybe i find a point should be change #12

Open alphanlp opened 5 years ago

alphanlp commented 5 years ago

self.target_layer = TimeDistributed(Dense(o_tokens.num(), use_bias=False)) change to: self.target_layer = TimeDistributed(Dense(o_tokens.num(), activation='softmax', use_bias=False))

alphanlp commented 5 years ago

it's very interesting, when i user softmax as proposed in paper, the loss can not down

lsdefine commented 5 years ago

The tf loss contains a softmax. In fact, you do softmax twice.