Closed prajjwal1 closed 4 years ago
Might be of interest to @joeddav :)
I've solved this problem. Thanks a lot @joeddav for even showing interest. You guys are very supportive. I'll post what I did so that if someone is stuck, they can refer.
In all SequenceClassification
model, there's a linear layer. So we can straightaway add the loss from both heads. You can choose to decide how to process the logits , for ex. [u+v, u-v, u*v]
where u
and v
are the respective output vector/logits. It was not a good idea to directly deal with raw hidden states from BertModel
in my case. I'm closing it now.
@prajjwal1 Glad you figured it out! FYI we launched a discussion forum this week (after you opened this issue I think). Questions like this would be well-suited to that forum if you have more to ask or want to help out other people in the community! 😇
@joeddav Yeah I have answered a couple of questions there already. I was the one who commented about requesting a forum and posted a link at ACL chat. The forum came into being the next day. Maybe someone was working inside the team while other team members didn't know. But really good to have it.
@prajjwal1 Ahhh yess, sorry I didn't make the connection :) Yes, we had been having some discussions about having our own forum and I knew we had some people working on it, but none of us on rocket chat realized it would be released the next day haha
❓ Questions & Help
I'm adding it here since I didn't receive any reply on SO. I think this query might be relevant for people who are working in few-shot, contrastive/metric learning space.
I'm trying to implement Siamese like transformer architecture. Similar work has been done in SentenceBERT paper. I'm facing an issue. To seperate hypothesis and premise, I modify this line from
_glue_convert_examples_to_features
. Instead I doDid the same thing for
examples.text_b
to obtainbatch_encoding_b
. Then I modify the GlueDataset by modifying this line mainly, since this will return two items now (segregated "hypothesis" and premise"). Then__getitem__
is modified accordingly to returnself.features_a[i], self.features_b[i]
.That's the gist of how I'm obtaining segregated "hypothesis" and "premise". These are then passed to two BERTs (or one BERT if its weights are kept frozen).
This is how I've defined the
collate_fn
Then the
dataloader
is created in the usual way. So when we iterate like this:I had to modify
_training_step
andevaluate
accordingly in theTrainer
class.Now the problem is, the model doesn't learn at all (
bert-base-uncased
). I tried with using mymodel
and modifiedTrainer
with standardGlueDataset
, and it works. This leads to the conclusion that something is off with the data. The model should learn something (even if is not being fed concatenated "hypothesis" and "premise").The model basically has one BERT and one linear layer. The logits come from linear layer which is then used to compute loss function (typical siamese like architecture).
Can you suggest if there's an issue in how the
Dataset
is being created here, or propose something of your own to segregate "hypothesis" and "premise" so that they can be fed separately to BERT.Link to Stack Overflow question