Question about processing entity output in model

Hi @liuyaduo,

Thanks for the detailed comparison. Indeed, this code does not have this additional fully connected layer + activation function. You can easily add this function as follows:

def __init__(self, config):
        super(BertForSequenceClassification, self).__init__(config)
        self.num_labels = config.num_labels
        self.bert = BertModel(config)
        self.cls_dropout = nn.Dropout(0.1)  # dropout on CLS transformed token embedding
        self.ent_dropout = nn.Dropout(0.5)  # dropout on average entity embedding
        self.ffn = nn.Linear(config.hidden_size, config.hidden_size)
        self.activation = nn.Tanh()
        self.classifier = nn.Linear(config.hidden_size*3, self.config.num_labels)
        self.init_weights()

def forward(self, ......):
        ...
        e1_h = self.ent_dropout(extract_entity(sequence_output, e1_mask))
        e2_h = self.ent_dropout(extract_entity(sequence_output, e2_mask))
        e1_h = self.ffn(self.activation(e1_h))
        e2_h = self.ffn(self.activation(e2_h))
        context = self.cls_dropout(pooled_output)
        pooled_output = torch.cat([context, e1_h, e2_h], dim=-1)
        ...

I am not sure whether this will improve the performance but you can easily try it.

Hope this answers your question.

mickeysjm / R-BERT

Question about processing entity output in model #4