Hi, I get a little confused about the memory module.
The shape of the memory in your code is Batchsize x Memory_size x Slotsize(or vector_dim). So actually when the training process is finished, there are Batchsize memories generated, not just one memory.
That means, once the value of batchsize is changed, the memory changed, which seems make no sense.
During test process, the value of Batchsize should be the same with that in the training process. So what if I got a batch that contains N samples (N<Batchsize), which memory should each sample use?(when N = Batchsize, each sample use one memory).
The key problem, I think, is why there should be more than 1 memory?
Hi, I get a little confused about the memory module. The shape of the memory in your code is Batchsize x Memory_size x Slotsize(or vector_dim). So actually when the training process is finished, there are Batchsize memories generated, not just one memory. That means, once the value of batchsize is changed, the memory changed, which seems make no sense. During test process, the value of Batchsize should be the same with that in the training process. So what if I got a batch that contains N samples (N<Batchsize), which memory should each sample use?(when N = Batchsize, each sample use one memory).
The key problem, I think, is why there should be more than 1 memory?
Thank you.