The the number of MLP in ERNet

thunlp / GEAR

Source code for ACL 2019 paper "GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification"

MIT License

98 stars 25 forks source link

Closed iambabao closed 4 years ago

iambabao commented 4 years ago

论文中提到在ERNet中利用MLP计算attention，从论文来理解是每层ERNet会包含两个参数$W{0}^{t}$和$W{1}^{t}$用于MLP。但是从代码实现上，好像是为每层ERNet的每个节点都初始化了两个参数$W{0}$和$W{1}$：

# each SelfAttentionLayer cantains two Linear
self.attentions = [SelfAttentionLayer(nhid=nhid * 2, nins=nins) for _ in range(nins)]

所以MLP的参数在层内不是共享的吗？

jayzzhou-thu commented 4 years ago

@iambabao 这里应该是每个节点单独计算了不同的attention，论文中的表述可能忽略了这个细节，抱歉造成了歧义