huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.87k stars 27.2k forks source link

Output of BertModel does not match the last hidden layer from fixed feature vectors #819

Closed sasaadi closed 5 years ago

sasaadi commented 5 years ago

Based on BERT documentation (https://github.com/google-research/bert#using-bert-to-extract-fixed-feature-vectors-like-elmo) we can extract the contextualized token embeddings of each hidden layer separately. However, when I extract the last hidden layer (layer -1), it does not match the outputs[0] from pytorch_transformers.BertModel() as described here: https://huggingface.co/pytorch-transformers/model_doc/bert.html#bertmodel

Just to remind that I am using the same pre-trained model (e.g. bert-base-uncased) and the same input (e.g. 'here is an example .') for both.

thomwolf commented 5 years ago

What is your exact command to extract the last hidden layer (layer -1)? And what is your exact command to get _the outputs[0] from pytorchtransformers.BertModel() ?

sasaadi commented 5 years ago

To extract the last hidden layer (layer -1) from BERT, I run the extract_features.py as follows: python extract_features.py --input_file=tmp/input.txt --output_file=tmp/output.json --vocab_file=cased_L-12_H-768_A-12/vocab.txt --bert_config_file=cased_L-12_H-768_A-12/bert_config.json --init_checkpoint=cased_L-12_H-768_A-12/bert_model.ckpt --layers=-1 --max_seq_length=128 --batch_size=1

where the input_file contains only one line e.g. 'here is an example .' The output gives me the -1 hidden layer of each token separately.

To get the embeddings from the outputs[0]:

config = BertConfig.from_pretrained('bert-base-cased')
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = BertModel(config)
input_ids = torch.tensor(tokenizer.encode("here is an example .")).unsqueeze(0)  # Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0]

where last_hidden_states gives me a list of embeddings. I presume one for each token in the sentence in the same order they appear in the sentence.

Thanks

hungph-dev-ict commented 5 years ago

Help me, i having same problem, how to extract feature from tuned .bin file, in bert's original doc, only init ckpt checkpoint used

thomwolf commented 5 years ago

@sasaadi, you should load the pretrained model with model = BertModel.from_pretrained('bert-base-cased'). In your example only the config (a dict of hyper-parameters) is loaded from the pretrained model, not the weights.

hungph-dev-ict commented 5 years ago

@thomwolf pytorch_transformers.BertModel.from_pretrained('bert-base-multilingual-cased', state_dict=model_state_dict) Is this solution when you load from tuned model ?

LysandreJik commented 5 years ago

@hungph-dev-ict to load from a fine-tuned checkpoint you reference it directly: BertModel.from_pretrained('/path/to/finetuned/model').

thomwolf commented 5 years ago

The doc for the method referenced by @LysandreJik is here

hungph-dev-ict commented 5 years ago

@LysandreJik @thomwolf thank you very much. Now this library has just added RoBERTa, I want tune it with my corpus, do you have any solution ?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.