Open alphanlp opened 5 years ago
self.target_layer = TimeDistributed(Dense(o_tokens.num(), use_bias=False)) change to: self.target_layer = TimeDistributed(Dense(o_tokens.num(), activation='softmax', use_bias=False))
it's very interesting, when i user softmax as proposed in paper, the loss can not down
The tf loss contains a softmax. In fact, you do softmax twice.
self.target_layer = TimeDistributed(Dense(o_tokens.num(), use_bias=False)) change to: self.target_layer = TimeDistributed(Dense(o_tokens.num(), activation='softmax', use_bias=False))