Open saravananpsg opened 2 years ago
We are sorry about that two hyper-parameters are misleadingly provided. You can try to set the learning rate as 0.0001 and the decay of the learning rate as 1.0.
Code of the model:
`class BertTaker(torch.nn.Module): def init(self, in_dim = 1024, out_dim = 2): super(BertTaker, self).init() self.bert = BertModel.from_pretrained('bert-large-uncased') self.bert.train() self.dropout = torch.nn.Dropout(0.5) self.probe = torch.nn.Sequential(torch.nn.Linear(in_dim,int(in_dim/4)),torch.nn.ReLU(),torch.nn.Linear(int(in_dim/4),out_dim)) torch.nn.init.xaviernormal(self.probe[0].weight) torch.nn.init.uniform_(self.probe[0].bias,-0.2,0.2) torch.nn.init.xaviernormal(self.probe[2].weight) torch.nn.init.uniform_(self.probe[2].bias,-0.2,0.2)
def forward(self, batch):
(x,m,s),_,_ = batch
x_bert = self.bert(input_ids = x, attention_mask = m)[0]
x_emb = x_bert[:,0,:]
x_emb = self.dropout(x_emb)
pred = self.probe(x_emb)
return pred `
I have tried to replicate the model as mentioned in the paper. "fine-tune the large versions of LMs with the same hidden size (1024) and adopt a two-layer perceptron to predict the logical relation" With inputs as “[CLS] facts rules [SEP] statement [SEP]” for BERT and RoBERTa.
Bert model shows very accuracy of 0.25% and robera base version 0.52%. Also when I added two hidden layers, model gives only 25% accuracy. May I know why there is a huge variation in the accuracy ? Are the large version of roberta would give better results ?
Please share your model codes!
Thank you