Hi, thanks for sharing codes. I tried to revise the model according to your paper.
1). Fixed KL divergency calculation bug.
2). Added feature for supporting batch processing. (input: [batch, N, d])
3). Fixed problem when the dimension of input is not equal to that of Transformer. ( d != d_model)
Hi, thanks for sharing codes. I tried to revise the model according to your paper. 1). Fixed KL divergency calculation bug. 2). Added feature for supporting batch processing. (input: [batch, N, d]) 3). Fixed problem when the dimension of input is not equal to that of Transformer. ( d != d_model)