Open chengm15 opened 7 years ago
@chengm15 It is the same thing. In the NPI paper, the output of last hidden layer (NBxK) is multiplied with the key memory (NxK), that is (NBxK) * (NxK)^T -> (NBxN), and this is eq. (4) in the NPI paper. This is just a Dense layer, without bias.
According to the paper, NPI produce key and find the program with the same index. However, in this code, NPI use fully connected layer to produce program. Can you explain this question? In response to your early reply.