Open marksilver6 opened 2 years ago
We are sorry about that two hyper-parameters are misleadingly provided. You can try to set the learning rate as 0.0001 and the decay of the learning rate as 1.0.
Code of the model:
`class BertTaker(torch.nn.Module): def init(self, in_dim = 1024, out_dim = 2): super(BertTaker, self).init() self.bert = BertModel.from_pretrained('bert-large-uncased') self.bert.train() self.dropout = torch.nn.Dropout(0.5) self.probe = torch.nn.Sequential(torch.nn.Linear(in_dim,int(in_dim/4)),torch.nn.ReLU(),torch.nn.Linear(int(in_dim/4),out_dim)) torch.nn.init.xaviernormal(self.probe[0].weight) torch.nn.init.uniform_(self.probe[0].bias,-0.2,0.2) torch.nn.init.xaviernormal(self.probe[2].weight) torch.nn.init.uniform_(self.probe[2].bias,-0.2,0.2)
def forward(self, batch):
(x,m,s),_,_ = batch
x_bert = self.bert(input_ids = x, attention_mask = m)[0]
x_emb = x_bert[:,0,:]
x_emb = self.dropout(x_emb)
pred = self.probe(x_emb)
return pred `
Hi, I am trying to reproduce the results as reported in the paper. I tried BERT-base model as the backbone, using BertForSequenceClassification provided in HF with num_labels=4. I use "facts rules [sep] statement i" as input to train the model. However, the accuracy is always 25% (the model ouput a constant label for all examples). Is it possible to provide the code to reproduce the paper results?