prajjwal1 commented 4 years ago

❓ Questions & Help

I'm adding it here since I didn't receive any reply on SO. I think this query might be relevant for people who are working in few-shot, contrastive/metric learning space.

I'm trying to implement Siamese like transformer architecture. Similar work has been done in SentenceBERT paper. I'm facing an issue. To seperate hypothesis and premise, I modify this line from _glue_convert_examples_to_features. Instead I do

batch_encoding_a = tokenizer(
        [example.text_a for example in examples],
        max_length=max_length,
        padding="max_length",
        truncation=True,
    )

Did the same thing for examples.text_b to obtain batch_encoding_b. Then I modify the GlueDataset by modifying this line mainly, since this will return two items now (segregated "hypothesis" and premise"). Then __getitem__ is modified accordingly to return self.features_a[i], self.features_b[i].

That's the gist of how I'm obtaining segregated "hypothesis" and "premise". These are then passed to two BERTs (or one BERT if its weights are kept frozen).

This is how I've defined the collate_fn

def siamese_data_collator(batch):
    features_a, features_b = [], []
    for item in batch:
        for k, v in item.items():
            if k == "a":
                features_a.append(v)
            else:
                features_b.append(v)
    return {
        "a": default_data_collator(features_a),
        "b": default_data_collator(features_b),
    }

Then the dataloader is created in the usual way. So when we iterate like this:

def _training_step(...):
        model.train()
        for k, v in inputs["a"].items():
            if isinstance(v, torch.Tensor):
                inputs["a"][k] = v.to(self.args.device)

    # we get inputs['a'] and inputs['b'] which is passed to the model

I had to modify _training_step and evaluate accordingly in the Trainer class.

Now the problem is, the model doesn't learn at all (bert-base-uncased). I tried with using my model and modified Trainer with standard GlueDataset, and it works. This leads to the conclusion that something is off with the data. The model should learn something (even if is not being fed concatenated "hypothesis" and "premise").

The model basically has one BERT and one linear layer. The logits come from linear layer which is then used to compute loss function (typical siamese like architecture).

Can you suggest if there's an issue in how the Dataset is being created here, or propose something of your own to segregate "hypothesis" and "premise" so that they can be fed separately to BERT.

Link to Stack Overflow question

LysandreJik commented 4 years ago

Might be of interest to @joeddav :)

prajjwal1 commented 4 years ago

I've solved this problem. Thanks a lot @joeddav for even showing interest. You guys are very supportive. I'll post what I did so that if someone is stuck, they can refer. In all SequenceClassification model, there's a linear layer. So we can straightaway add the loss from both heads. You can choose to decide how to process the logits , for ex. [u+v, u-v, u*v] where u and v are the respective output vector/logits. It was not a good idea to directly deal with raw hidden states from BertModel in my case. I'm closing it now.

joeddav commented 4 years ago

@prajjwal1 Glad you figured it out! FYI we launched a discussion forum this week (after you opened this issue I think). Questions like this would be well-suited to that forum if you have more to ask or want to help out other people in the community! 😇

prajjwal1 commented 4 years ago

@joeddav Yeah I have answered a couple of questions there already. I was the one who commented about requesting a forum and posted a link at ACL chat. The forum came into being the next day. Maybe someone was working inside the team while other team members didn't know. But really good to have it.

joeddav commented 4 years ago

@prajjwal1 Ahhh yess, sorry I didn't make the connection :) Yes, we had been having some discussions about having our own forum and I knew we had some people working on it, but none of us on rocket chat realized it would be released the next day haha

huggingface / transformers

Seperating premise and hypothesis in MNLI #5575

❓ Questions & Help