Closed oisin-dolphin closed 5 years ago
The convert_tf_checkpoint_to_pytorch
script is made to convert the Google pre-trained weights in BertForPretraining
model, you have to modify it to convert another type model.
In your case, you want to load the passage re-ranking model in a BertForSequenceClassification
model which has the same structure (BERT + a classifier on top of the pooled output) as the NYU model.
here is a quick way to do that:
BertForSequenceClassification
model instead of the BertForPreTraining
model in the conversion script.pointer = getattr(pointer, 'cls')
in the TWO if-conditions related to output_weights
and output_bias
(between L89 and L90 and between L91 and L92 in modeling.py here: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L90 and https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L92).Thanks so much! Your comment saved me a lot of time. However there was a small issue I got around by just changing the tf variable names.
For anyone else out there the solution was
https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py#L34 CHANGE model = BertForSequenceClassification(config, 2)
if name in ['output_weights' , 'output_bias']:
name = 'classifier/' + name
Hello @oisin-dolphin and @thomwolf I followed above suggestions but getting following error. tensorflow.python.framework.errors_impl.NotFoundError: Key classifier/output_bias not found in checkpoint
Also what is significance of following line of code pointer = getattr(pointer, 'cls')
Please suggest.
Thanks Mahesh
The
convert_tf_checkpoint_to_pytorch
script is made to convert the Google pre-trained weights inBertForPretraining
model, you have to modify it to convert another type model.In your case, you want to load the passage re-ranking model in a
BertForSequenceClassification
model which has the same structure (BERT + a classifier on top of the pooled output) as the NYU model.here is a quick way to do that:
- install pytorch-pretrained-bert from source so you can modify it
- change https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py#L34 to initialize a
BertForSequenceClassification
model instead of theBertForPreTraining
model in the conversion script.- the structure is not exactly identical so you need to ADD a line that say
pointer = getattr(pointer, 'cls')
in the TWO if-conditions related tooutput_weights
andoutput_bias
(between L89 and L90 and between L91 and L92 in modeling.py here: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L90 and https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L92).- this should let you convert the tensorflow model in a pytorch one using the scripts.
I followed these instructions for the SequenceClassification model but I still end up getting the same error for 'BertForSequenceClassification' object has no attribute 'bias'.
Update for latest transformers, add modeling_bert.py:78:
for name, array in zip(names, arrays):
if name in ['output_weights', 'output_bias']:
name = 'classifier/' + name
and convert_bert_original_tf_checkpoint_to_pytorch.py
config.num_labels = 2
print("Building PyTorch model from configuration: {}".format(str(config)))
model = BertForSequenceClassification(config)
you are my lifesaver @pertschuk Thank you for the instructions
glad they helped @Soonhwan-Kwon.
I used a similar reranking model as part of a project I just released which hooks in to Elasticsearch and reranks search results out of the box, check it out if this sounds like it would be useful! repo: https://github.com/koursaros-ai/nboost
You can create a subclass of BertForSequenceClassification
and add self.weight
and self.bias
to the__init__
method. Then instantiate your new class and it is ready to use it:
class BertForPassageRanking(BertForSequenceClassification):
def __init__(self, config):
super().__init__(config)
self.weight = torch.autograd.Variable(torch.ones(2, config.hidden_size),
requires_grad=True)
self.bias = torch.autograd.Variable(torch.ones(2), requires_grad=True)
bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
from_tf=True)
BERT_PASSAGE_RANKING_PATH
is the path where your tf checkpoints files and config json file are stored. You will need to rename the files as follows:
config.json
model.ckpt.index
model.ckpt.meta
Another option if you do not want to change the file names is to load the json config file with BertConfig.from_json_file()
and then pass to BertForPassageRanking.from_pretained()
the path + ckpt file name and the configuration that you have already loaded with BertConfig.from_json_file()
.
I added passage pytorch msmarco reranking models to the huggingface / transformers bucket, no need for subclassing / modifications.
I added passage pytorch msmarco reranking models to the huggingface / transformers bucket, no need for subclassing / modifications.
Hi, I have a question regarding the output of your models. In transformers library, the bert_base model (transformers.BertModel
class) has as output a tuple, where the first element is the last hidden state and the 2nd element is the pooler output. The last hidden state is a tensor of size (batch_size, sequence_length, hidden_dim)
. For example for a batch size of 64 and 512 tokens we obtain for BERT an output of size (64x512x768)
. The pooler output has size (batch_size, hidden_size)
. This output is obtained training a linear layer with tanh activation function which had as input the CLS
token hidden state (last layer hidden-state of the first oken of the sequence). Those weights have been trained from the next sentence prediction.
Your model follows similar structure, at least nboost/pt-biobert-base-msmarco
. However, a passage re-ranking model is a sequence classification model. Basically, the passage re-ranking model proposed by https://github.com/nyu-dl/dl4marco-bert is the BERT model fine-tuned with a dense layer on top to learn to classify a sequence as relevant or not relevant. Their first element of the tuple output is a tensor of size (batch_size, num_classes)
, where num_classes is two (whether the sequence to classify is a relevant document).
How should we use your model for passage re-ranking? Thanks a lot
I added passage pytorch msmarco reranking models to the huggingface / transformers bucket, no need for subclassing / modifications. https://huggingface.co/nboost
Hi, I have a question regarding the output of your models. In transformers library, the bert_base model (
transformers.BertModel
class) has as output a tuple, where the first element is the last hidden state and the 2nd element is the pooler output. The last hidden state is a tensor of size(batch_size, sequence_length, hidden_dim)
. For example for a batch size of 64 and 512 tokens we obtain for BERT an output of size(64x512x768)
. The pooler output has size(batch_size, hidden_size)
. This output is obtained training a linear layer with tanh activation function which had as input theCLS
token hidden state (last layer hidden-state of the first oken of the sequence). Those weights have been trained from the next sentence prediction.Your model follows similar structure, at least
nboost/pt-biobert-base-msmarco
. However, a passage re-ranking model is a sequence classification model. Basically, the passage re-ranking model proposed by https://github.com/nyu-dl/dl4marco-bert is the BERT model fine-tuned with a dense layer on top to learn to classify a sequence as relevant or not relevant. Their first element of the tuple output is a tensor of size(batch_size, num_classes)
, where num_classes is two (whether the sequence to classify is a relevant document).How should we use your model for passage re-ranking? Thanks a lot
I found where was the problem. As pointed in the model's page (https://huggingface.co/nboost/pt-biobert-base-msmarco#) to load the model you have to do the following:
model = AutoModel.from_pretrained("nboost/pt-biobert-base-msmarco")
This creates as output a tuple where the first element is a tensor of size (64x512x768)
.
However, we should do the following, since our problem is a sequence classification:
model = AutoModelForSequenceClassification.from_pretrained("nboost/pt-biobert-base-msmarco")
This creates the correct output, a tuple where the first element is a tensor of size (batch_size, num_classes)
I suggest to the authors to change the model info and model card in https://huggingface.co/nboost/pt-biobert-base-msmarco#, since it is little bit confusing
You can create a subclass of
BertForSequenceClassification
and addself.weight
andself.bias
to the__init__
method. Then instantiate your new class and it is ready to use it:class BertForPassageRanking(BertForSequenceClassification): def __init__(self, config): super().__init__(config) self.weight = torch.autograd.Variable(torch.ones(2, config.hidden_size), requires_grad=True) self.bias = torch.autograd.Variable(torch.ones(2), requires_grad=True) bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH, from_tf=True)
BERT_PASSAGE_RANKING_PATH
is the path where your tf checkpoints files and config json file are stored. You will need to rename the files as follows:config.json model.ckpt.index model.ckpt.meta
Another option if you do not want to change the file names is to load the json config file with
BertConfig.from_json_file()
and then pass toBertForPassageRanking.from_pretained()
the path + ckpt file name and the configuration that you have already loaded withBertConfig.from_json_file()
.
Thanks a lot. I was having the same question about 'nboost' and was trying this method. However, the output seems to change when I run the same code multiple times, even though i am in the eval mode. Do you have any hint about what I am doing wrong here?
bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
from_tf=True)
dummy_query = [
'Rutgers is a good university. I like my experience there.',
"Hello, my dog is cute. My cute dog is amazing.",
'Florida is a nice place but tiger king may be better',
]
dummy_passage = [
'My cat is really cute but my dog is even better.',
'My cat is really cute but my dog is even better.',
'My cat is really cute but my dog is even better.',
]
bert_ranking.eval()
with torch.no_grad():
for idx in range(len(dummy_query)):
input_ids = torch.tensor(tokenizer.encode(text=dummy_query[idx], \
text_pair=dummy_passage[idx], add_special_tokens=True)).unsqueeze(0)
outputs = bert_ranking(input_ids)
print(outputs)
Thanks a lot. I was having the same question about 'nboost' and was trying this method. However, the output seems to change when I run the same code multiple times, even though i am in the eval mode. Do you have any hint about what I am doing wrong here?
bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH, from_tf=True) dummy_query = [ 'Rutgers is a good university. I like my experience there.', "Hello, my dog is cute. My cute dog is amazing.", 'Florida is a nice place but tiger king may be better', ] dummy_passage = [ 'My cat is really cute but my dog is even better.', 'My cat is really cute but my dog is even better.', 'My cat is really cute but my dog is even better.', ] bert_ranking.eval() with torch.no_grad(): for idx in range(len(dummy_query)): input_ids = torch.tensor(tokenizer.encode(text=dummy_query[idx], \ text_pair=dummy_passage[idx], add_special_tokens=True)).unsqueeze(0) outputs = bert_ranking(input_ids) print(outputs)
Sorry, I have no idea. Finally I am not using this approximation. I did not achieve good results for my purpose. Intead, I am using the model provided by nboost (https://huggingface.co/nboost/pt-tinybert-msmarco) and it works fine for me. Remember to load the model as follows:
model = AutoModelForSequenceClassification.from_pretrained("nboost/pt-tinybert-msmarco")
I am using tinybert-msmarco, however you can use one of the following models:
nboost/pt-bert-base-uncased-msmarco
nboost/pt-bert-large-msmarco
nboost/pt-biobert-base-msmarco
nboost/pt-tinybert-msmarco
Hi, I have fine tuned a multilingual model, taken from hugging face, on the passage reranking task. Now I am facing difficulties with converting the tensorflow checkpoint to a pytorch model, so that I can use the model using BertForSequenceClassification
.
I am using the following conversion function, but I get this error
File "<ipython-input-50-1e24e5635ec9>", line 1, in <module>
convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_file, pytorch_dump_path)
File "<ipython-input-49-22827240b095>", line 63, in convert_tf_checkpoint_to_pytorch
assert pointer.shape == array.shape
File "/home/igli/anaconda3/envs/search-boost/lib/python3.8/site-packages/torch/nn/modules/module.py", line 593, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LayerNorm' object has no attribute 'shape'
The conversion method:
def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_file, pytorch_dump_path):
config_path = os.path.abspath(bert_config_file)
tf_path = os.path.abspath(tf_checkpoint_path)
print("Converting TensorFlow checkpoint from {} with config at {}".format(tf_path, config_path))
# Load weights from TF model
init_vars = tf.train.list_variables(tf_path)
names = []
arrays = []
for name, shape in init_vars:
print("Loading TF weight {} with shape {}".format(name, shape))
array = tf.train.load_variable(tf_path, name)
names.append(name)
arrays.append(array)
# Initialise PyTorch model
config = BertConfig.from_json_file(bert_config_file)
config.num_labels = 2
print("Building PyTorch model from configuration: {}".format(str(config)))
model = BertForSequenceClassification()(config=config)
for name, array in zip(names, arrays):
if name in ['output_weights' , 'output_bias']:
name = 'classifier/' + name
name = name.split('/')
# adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
# which are not required for using pretrained model
if name[-1] in ["adam_v", "adam_m"]:
print("Skipping {}".format("/".join(name)))
continue
pointer = model
for m_name in name:
if re.fullmatch(r'[A-Za-z]+_\d+', m_name):
l = re.split(r'_(\d+)', m_name)
else:
l = [m_name]
if l[0] == 'kernel':
pointer = getattr(pointer, 'weight')
elif l[0] == 'output_bias':
pointer = getattr(pointer, 'bias')
pointer = getattr(pointer, 'cls')
elif l[0] == 'output_weights':
pointer = getattr(pointer, 'weight')
pointer = getattr(pointer, 'cls')
else:
try:
pointer = getattr(pointer, l[0])
except:
pass
if len(l) >= 2:
num = int(l[1])
pointer = pointer[num]
if m_name[-11:] == '_embeddings':
pointer = getattr(pointer, 'weight')
elif m_name == 'kernel':
array = np.transpose(array)
try:
assert pointer.shape == array.shape
except AssertionError as e:
e.args += (pointer.shape, array.shape)
raise
#pass
print("Initialize PyTorch weight {}".format(name))
array = np.array(array)
print(array)
print(type(array))
pointer.data = torch.from_numpy(array)
# Save pytorch-model
print("Save PyTorch model to {}".format(pytorch_dump_path))
torch.save(model.state_dict(), pytorch_dump_path)
I have currently no clue, where the problem might be. Thanks in advanvce!
Update for latest transformers, add modeling_bert.py:78:
for name, array in zip(names, arrays): if name in ['output_weights', 'output_bias']: name = 'classifier/' + name
and convert_bert_original_tf_checkpoint_to_pytorch.py
config.num_labels = 2 print("Building PyTorch model from configuration: {}".format(str(config))) model = BertForSequenceClassification(config)
As of 26/Mar/2021,
modeling_bert.py:78
is now around modeling_bert.py:118
convert_bert_original_tf_checkpoint_to_pytorch.py
is now around convert_bert_original_tf_checkpoint_to_pytorch.py:33
. BTW, don't forget from transformers import BertForSequenceClassification
Hi I am currently trying to implement bert for passage reranking in pytorch. Here is the paper and github repo. https://arxiv.org/abs/1901.04085 https://github.com/nyu-dl/dl4marco-bert
I've downloaded their bert large model checkpoint and bert config for the task the
convert_tf_checkpoint_to_pytorch
function seems to successfully extract the weights from tensorflow.Then while initialising the pytorch model
I assume it is issues with the final layer What is the best way for me to go about resolving this?
thanks in advance!