Bert for passage reranking

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

129.86k stars 25.8k forks source link

Bert for passage reranking #580

Closed oisin-dolphin closed 5 years ago

oisin-dolphin commented 5 years ago

Hi I am currently trying to implement bert for passage reranking in pytorch. Here is the paper and github repo. https://arxiv.org/abs/1901.04085 https://github.com/nyu-dl/dl4marco-bert

I've downloaded their bert large model checkpoint and bert config for the task the convert_tf_checkpoint_to_pytorch function seems to successfully extract the weights from tensorflow.

Then while initialising the pytorch model

Initialize PyTorch weight ['bert', 'pooler', 'dense', 'kernel']
Skipping bert/pooler/dense/kernel/adam_m
Skipping bert/pooler/dense/kernel/adam_v
Skipping global_step

     35 
     36     # Load weights from tf checkpoint
---> 37     load_tf_weights_in_bert(model, tf_checkpoint_path)
     38 
     39     # Save pytorch-model

~/anaconda3/envs/new_fast_ai/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py in load_tf_weights_in_bert(model, tf_checkpoint_path)
     88                 pointer = getattr(pointer, 'weight')
     89             elif l[0] == 'output_bias' or l[0] == 'beta':
---> 90                 pointer = getattr(pointer, 'bias')
     91             elif l[0] == 'output_weights':
     92                 pointer = getattr(pointer, 'weight')

~/anaconda3/envs/new_fast_ai/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    533                 return modules[name]
    534         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 535             type(self).__name__, name))
    536 
    537     def __setattr__(self, name, value):

AttributeError: 'BertForPreTraining' object has no attribute 'bias'

I assume it is issues with the final layer What is the best way for me to go about resolving this?

thanks in advance!

thomwolf commented 5 years ago

The convert_tf_checkpoint_to_pytorch script is made to convert the Google pre-trained weights in BertForPretraining model, you have to modify it to convert another type model.

In your case, you want to load the passage re-ranking model in a BertForSequenceClassification model which has the same structure (BERT + a classifier on top of the pooled output) as the NYU model.

here is a quick way to do that:

install pytorch-pretrained-bert from source so you can modify it
change https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py#L34 to initialize a BertForSequenceClassification model instead of the BertForPreTraining model in the conversion script.
the structure is not exactly identical so you need to ADD a line that say pointer = getattr(pointer, 'cls') in the TWO if-conditions related to output_weights and output_bias (between L89 and L90 and between L91 and L92 in modeling.py here: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L90 and https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L92).
this should let you convert the tensorflow model in a pytorch one using the scripts.

oisin-dolphin commented 5 years ago

Thanks so much! Your comment saved me a lot of time. However there was a small issue I got around by just changing the tf variable names.

For anyone else out there the solution was

https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py#L34 CHANGE model = BertForSequenceClassification(config, 2)
https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L70 ADD
```
if name in ['output_weights' , 'output_bias']:
    name = 'classifier/' + name
```

search4mahesh commented 5 years ago

Hello @oisin-dolphin and @thomwolf I followed above suggestions but getting following error. tensorflow.python.framework.errors_impl.NotFoundError: Key classifier/output_bias not found in checkpoint

Also what is significance of following line of code pointer = getattr(pointer, 'cls')

Please suggest.

Thanks Mahesh

chikubee commented 4 years ago

The convert_tf_checkpoint_to_pytorch script is made to convert the Google pre-trained weights in BertForPretraining model, you have to modify it to convert another type model.

In your case, you want to load the passage re-ranking model in a BertForSequenceClassification model which has the same structure (BERT + a classifier on top of the pooled output) as the NYU model.

here is a quick way to do that:

install pytorch-pretrained-bert from source so you can modify it

change https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py#L34 to initialize a BertForSequenceClassification model instead of the BertForPreTraining model in the conversion script.

the structure is not exactly identical so you need to ADD a line that say pointer = getattr(pointer, 'cls') in the TWO if-conditions related to output_weights and output_bias (between L89 and L90 and between L91 and L92 in modeling.py here: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L90 and https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L92).

this should let you convert the tensorflow model in a pytorch one using the scripts.

I followed these instructions for the SequenceClassification model but I still end up getting the same error for 'BertForSequenceClassification' object has no attribute 'bias'.

pertschuk commented 4 years ago

Update for latest transformers, add modeling_bert.py:78:

    for name, array in zip(names, arrays):
        if name in ['output_weights', 'output_bias']:
            name = 'classifier/' + name

and convert_bert_original_tf_checkpoint_to_pytorch.py

config.num_labels = 2
    print("Building PyTorch model from configuration: {}".format(str(config)))
    model = BertForSequenceClassification(config)

Soonhwan-Kwon commented 4 years ago

you are my lifesaver @pertschuk Thank you for the instructions

pertschuk commented 4 years ago

glad they helped @Soonhwan-Kwon.

I used a similar reranking model as part of a project I just released which hooks in to Elasticsearch and reranks search results out of the box, check it out if this sounds like it would be useful! repo: https://github.com/koursaros-ai/nboost

fran-martinez commented 4 years ago

You can create a subclass of BertForSequenceClassification and add self.weight and self.bias to the__init__ method. Then instantiate your new class and it is ready to use it:

class BertForPassageRanking(BertForSequenceClassification):
    def __init__(self, config):
        super().__init__(config)
        self.weight = torch.autograd.Variable(torch.ones(2, config.hidden_size),
                                              requires_grad=True)
        self.bias = torch.autograd.Variable(torch.ones(2), requires_grad=True)

bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
                                                     from_tf=True)

BERT_PASSAGE_RANKING_PATH is the path where your tf checkpoints files and config json file are stored. You will need to rename the files as follows:

config.json
model.ckpt.index
model.ckpt.meta

Another option if you do not want to change the file names is to load the json config file with BertConfig.from_json_file() and then pass to BertForPassageRanking.from_pretained() the path + ckpt file name and the configuration that you have already loaded with BertConfig.from_json_file() .

pertschuk commented 4 years ago

I added passage pytorch msmarco reranking models to the huggingface / transformers bucket, no need for subclassing / modifications.

https://huggingface.co/nboost

fran-martinez commented 4 years ago

I added passage pytorch msmarco reranking models to the huggingface / transformers bucket, no need for subclassing / modifications.

https://huggingface.co/nboost

Hi, I have a question regarding the output of your models. In transformers library, the bert_base model (transformers.BertModel class) has as output a tuple, where the first element is the last hidden state and the 2nd element is the pooler output. The last hidden state is a tensor of size (batch_size, sequence_length, hidden_dim). For example for a batch size of 64 and 512 tokens we obtain for BERT an output of size (64x512x768). The pooler output has size (batch_size, hidden_size). This output is obtained training a linear layer with tanh activation function which had as input the CLS token hidden state (last layer hidden-state of the first oken of the sequence). Those weights have been trained from the next sentence prediction.

Your model follows similar structure, at least nboost/pt-biobert-base-msmarco. However, a passage re-ranking model is a sequence classification model. Basically, the passage re-ranking model proposed by https://github.com/nyu-dl/dl4marco-bert is the BERT model fine-tuned with a dense layer on top to learn to classify a sequence as relevant or not relevant. Their first element of the tuple output is a tensor of size (batch_size, num_classes), where num_classes is two (whether the sequence to classify is a relevant document).

How should we use your model for passage re-ranking? Thanks a lot

fran-martinez commented 4 years ago

I added passage pytorch msmarco reranking models to the huggingface / transformers bucket, no need for subclassing / modifications. https://huggingface.co/nboost

Hi, I have a question regarding the output of your models. In transformers library, the bert_base model (transformers.BertModel class) has as output a tuple, where the first element is the last hidden state and the 2nd element is the pooler output. The last hidden state is a tensor of size (batch_size, sequence_length, hidden_dim). For example for a batch size of 64 and 512 tokens we obtain for BERT an output of size (64x512x768). The pooler output has size (batch_size, hidden_size). This output is obtained training a linear layer with tanh activation function which had as input the CLS token hidden state (last layer hidden-state of the first oken of the sequence). Those weights have been trained from the next sentence prediction.

Your model follows similar structure, at least nboost/pt-biobert-base-msmarco. However, a passage re-ranking model is a sequence classification model. Basically, the passage re-ranking model proposed by https://github.com/nyu-dl/dl4marco-bert is the BERT model fine-tuned with a dense layer on top to learn to classify a sequence as relevant or not relevant. Their first element of the tuple output is a tensor of size (batch_size, num_classes), where num_classes is two (whether the sequence to classify is a relevant document).

How should we use your model for passage re-ranking? Thanks a lot

I found where was the problem. As pointed in the model's page (https://huggingface.co/nboost/pt-biobert-base-msmarco#) to load the model you have to do the following:

model = AutoModel.from_pretrained("nboost/pt-biobert-base-msmarco") This creates as output a tuple where the first element is a tensor of size (64x512x768).

However, we should do the following, since our problem is a sequence classification:

model = AutoModelForSequenceClassification.from_pretrained("nboost/pt-biobert-base-msmarco") This creates the correct output, a tuple where the first element is a tensor of size (batch_size, num_classes)

I suggest to the authors to change the model info and model card in https://huggingface.co/nboost/pt-biobert-base-msmarco#, since it is little bit confusing

ds7711 commented 4 years ago

You can create a subclass of BertForSequenceClassification and add self.weight and self.bias to the__init__ method. Then instantiate your new class and it is ready to use it:
class BertForPassageRanking(BertForSequenceClassification):
    def __init__(self, config):
        super().__init__(config)
        self.weight = torch.autograd.Variable(torch.ones(2, config.hidden_size),
                                              requires_grad=True)
        self.bias = torch.autograd.Variable(torch.ones(2), requires_grad=True)

bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
                                                     from_tf=True)
BERT_PASSAGE_RANKING_PATH is the path where your tf checkpoints files and config json file are stored. You will need to rename the files as follows:
config.json
model.ckpt.index
model.ckpt.meta
Another option if you do not want to change the file names is to load the json config file with BertConfig.from_json_file() and then pass to BertForPassageRanking.from_pretained() the path + ckpt file name and the configuration that you have already loaded with BertConfig.from_json_file() .

Thanks a lot. I was having the same question about 'nboost' and was trying this method. However, the output seems to change when I run the same code multiple times, even though i am in the eval mode. Do you have any hint about what I am doing wrong here?

bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
                                                     from_tf=True)

dummy_query = [
    'Rutgers is a good university. I like my experience there.', 
    "Hello, my dog is cute. My cute dog is amazing.",
    'Florida is a nice place but tiger king may be better',
]

dummy_passage = [
    'My cat is really cute but my dog is even better.',
    'My cat is really cute but my dog is even better.',
    'My cat is really cute but my dog is even better.',
]
bert_ranking.eval()
with torch.no_grad():
    for idx in range(len(dummy_query)):
        input_ids = torch.tensor(tokenizer.encode(text=dummy_query[idx], \
            text_pair=dummy_passage[idx], add_special_tokens=True)).unsqueeze(0)
        outputs = bert_ranking(input_ids)
        print(outputs)

fran-martinez commented 4 years ago

bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
                                                     from_tf=True)

dummy_query = [
    'Rutgers is a good university. I like my experience there.', 
    "Hello, my dog is cute. My cute dog is amazing.",
    'Florida is a nice place but tiger king may be better',
]

dummy_passage = [
    'My cat is really cute but my dog is even better.',
    'My cat is really cute but my dog is even better.',
    'My cat is really cute but my dog is even better.',
]
bert_ranking.eval()
with torch.no_grad():
    for idx in range(len(dummy_query)):
        input_ids = torch.tensor(tokenizer.encode(text=dummy_query[idx], \
            text_pair=dummy_passage[idx], add_special_tokens=True)).unsqueeze(0)
        outputs = bert_ranking(input_ids)
        print(outputs)

Sorry, I have no idea. Finally I am not using this approximation. I did not achieve good results for my purpose. Intead, I am using the model provided by nboost (https://huggingface.co/nboost/pt-tinybert-msmarco) and it works fine for me. Remember to load the model as follows:

model = AutoModelForSequenceClassification.from_pretrained("nboost/pt-tinybert-msmarco")

I am using tinybert-msmarco, however you can use one of the following models:

nboost/pt-bert-base-uncased-msmarco
nboost/pt-bert-large-msmarco
nboost/pt-biobert-base-msmarco
nboost/pt-tinybert-msmarco

iglimanaj commented 4 years ago

Hi, I have fine tuned a multilingual model, taken from hugging face, on the passage reranking task. Now I am facing difficulties with converting the tensorflow checkpoint to a pytorch model, so that I can use the model using BertForSequenceClassification. I am using the following conversion function, but I get this error

File "<ipython-input-50-1e24e5635ec9>", line 1, in <module>
    convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_file, pytorch_dump_path)

  File "<ipython-input-49-22827240b095>", line 63, in convert_tf_checkpoint_to_pytorch
    assert pointer.shape == array.shape

  File "/home/igli/anaconda3/envs/search-boost/lib/python3.8/site-packages/torch/nn/modules/module.py", line 593, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(

AttributeError: 'LayerNorm' object has no attribute 'shape'

The conversion method:

def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_file, pytorch_dump_path):
        config_path = os.path.abspath(bert_config_file)
        tf_path = os.path.abspath(tf_checkpoint_path)
        print("Converting TensorFlow checkpoint from {} with config at {}".format(tf_path, config_path))
        # Load weights from TF model
        init_vars = tf.train.list_variables(tf_path)
        names = []
        arrays = []
        for name, shape in init_vars:
            print("Loading TF weight {} with shape {}".format(name, shape))
            array = tf.train.load_variable(tf_path, name)
            names.append(name)
            arrays.append(array)

        # Initialise PyTorch model
        config = BertConfig.from_json_file(bert_config_file)
        config.num_labels = 2

        print("Building PyTorch model from configuration: {}".format(str(config)))
        model = BertForSequenceClassification()(config=config)

        for name, array in zip(names, arrays):
            if name in ['output_weights' , 'output_bias']:
                    name = 'classifier/' + name
            name = name.split('/')
            # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
            # which are not required for using pretrained model
            if name[-1] in ["adam_v", "adam_m"]:
                print("Skipping {}".format("/".join(name)))
                continue
            pointer = model

            for m_name in name:  

                if re.fullmatch(r'[A-Za-z]+_\d+', m_name):
                    l = re.split(r'_(\d+)', m_name)
                else:
                    l = [m_name]
                if l[0] == 'kernel':
                    pointer = getattr(pointer, 'weight')
                elif l[0] == 'output_bias':
                    pointer = getattr(pointer, 'bias')
                    pointer = getattr(pointer, 'cls')
                elif l[0] == 'output_weights':
                    pointer = getattr(pointer, 'weight')
                    pointer = getattr(pointer, 'cls')       
                else:
                    try:
                        pointer = getattr(pointer, l[0])
                    except:
                        pass

                if len(l) >= 2:
                    num = int(l[1])
                    pointer = pointer[num]
            if m_name[-11:] == '_embeddings':
                pointer = getattr(pointer, 'weight')
            elif m_name == 'kernel':
                array = np.transpose(array)
            try:
                assert pointer.shape == array.shape
            except AssertionError as e:
                e.args += (pointer.shape, array.shape)
                raise
                #pass

            print("Initialize PyTorch weight {}".format(name))
            array = np.array(array)
            print(array)
            print(type(array))
            pointer.data = torch.from_numpy(array)

        # Save pytorch-model
        print("Save PyTorch model to {}".format(pytorch_dump_path))
        torch.save(model.state_dict(), pytorch_dump_path)

I have currently no clue, where the problem might be. Thanks in advanvce!

NigelC15 commented 3 years ago

Update for latest transformers, add modeling_bert.py:78:

    for name, array in zip(names, arrays):
        if name in ['output_weights', 'output_bias']:
            name = 'classifier/' + name

and convert_bert_original_tf_checkpoint_to_pytorch.py

config.num_labels = 2
    print("Building PyTorch model from configuration: {}".format(str(config)))
    model = BertForSequenceClassification(config)

As of 26/Mar/2021, modeling_bert.py:78 is now around modeling_bert.py:118 convert_bert_original_tf_checkpoint_to_pytorch.py is now around convert_bert_original_tf_checkpoint_to_pytorch.py:33. BTW, don't forget from transformers import BertForSequenceClassification