If the batch size is 128, then q_embed is the features of the 128 questions after the 'word embedding', and d_embed is the features of the 128 responses. After the concatenate function, it is not the corresponding two-sentence connection, So the first layer of input is how to perform one-dimensional convolution? I don't understand this very well. Can you explain it?
The paper says the q_embed and d_embed will interact with each other deeply by one-dimensional convolution. However, it seems that the code can not realize this idea. Could the author explain it?
If the batch size is 128, then q_embed is the features of the 128 questions after the 'word embedding', and d_embed is the features of the 128 responses. After the concatenate function, it is not the corresponding two-sentence connection, So the first layer of input is how to perform one-dimensional convolution? I don't understand this very well. Can you explain it?