Open tridemax opened 5 years ago
@tridemax was there any improvement in performance after this correction?
TBH, I didn't tried it in your code, in my TF2.0 implementation I've did it swapped and seems it works. =)
This is indeed a bug, but fortunately it does not affect the training process.
https://github.com/kimiyoung/transformer-xl/blob/44781ed21dbaec88b280f74d9ae2877f52b492a5/pytorch/mem_transformer.py#L733
Function signature is:
def _update_mems(self, hids, mems, qlen, mlen):
And the call is:
new_mems = self._update_mems(hids, mems, mlen, qlen)
mlen
andqlen
probably misordered in the function call?