Closed Aolin-MIR closed 6 years ago
@liaolin I wrote them by my comprehension. If you have any suggestions, please let me know.
I use the same encoder block in the model layer to compute the M0,M1,M2 instead of three blocks, and that's my way of sharing weights of the three blocks. But in the embedding layer, the length of question and paragraph is different, so I think if we want to do the weight sharing job, we must decompose the block or pass the weight to the block in the init method
@InitialBug You're right. The paper said, "We also share weights of the context and question encoder, and of the three output encoders." Thanks a lot!
Is the way of parameters sharing written exactly right?