Closed bengshaoye closed 2 years ago
Hi,
Would you mind elaborating an approximate range of the bias? Also, is the hidden states output of BertForPreTraining deterministic?
If the output is deterministic but slightly different than TensorFlow's output (the differences are smaller than roughly 1e-5), this is probably a normal behavior due to different BLAS implementations dependent on platform, framework...etc.
See also: https://github.com/pytorch/pytorch/issues/9146#issuecomment-409331986
@qqaatw thank you for answering. Yes, the hidden states output of both BerForPreTraining and google-research bert are deterministic. Sometimes the bias is small, may be 1e-3~1e-4, while using chinese_L-12_H-768_A-12, as showing below: google-research: [0.41173, 0.086385, 0.705549, 0.224586, 0.751009, -1.071174, -0.455632, -0.390582, -0.523216, 0.520333,...] bertforpretraining:[0.411758, 0.0876196, 0.705667, 0.224652, 0.75167, -1., -0.45543, -0.391009, -0.524803, 0.518317,...]
Sometimes the bias is big ,may be 1e-2~1e-1, while using a fine-tuned model from chinese_L-12_H-768_A-12, as showing below: google-research: [0.000858, 0.355273, -0.711266, 0.258692, 1.342211, -0.072978, -0.238096, 0.288613, -0.121792, -0.37079, ...] bertforpretraining:[0.017701, 0.348385, -0.742679, 0.240423, 1.337542, -0.0840113, -0.23040, 0.281977, -0.1528175, -0.3525075, ...]
@qqaatw ps: hidden_states from bertforpretraining were fetched like following: config = BertConfig.from_json_file('d:/workspace/bert-google/chinese_L-12_H-768_A-12/bert_config.json') model = BertForPreTraining.from_pretrained('d:/workspace/bert-google/chinese_L-12_H-768_A-12/bert_model.ckpt',from_tf=True,config=config) tokenizer = BertTokenizerFast.from_pretrained('d:/workspace/bert-google/chinese_L-12_H-768_A-12/') inputs = tokenizer('ηθ§ζη«ιΎζδΊε',return_tensors='pt') outputs = model(**inputs,output_hidden_states=True) print(outputs.hidden_states[-1][0,0,:10].tolist())
And features of each output layer were fetched using extract_features.py post by google-research git repo.
Hey @bengshaoye,
config = BertConfig.from_json_file('d:/workspace/bert-google/chinese_L-12_H-768_A-12/bert_config.json')
Could you change the hidden_act from gelu
to gelu_new
in bert_config.json
and try again?
Thanks a lot. gelu_new works fine for BertForPreTraining, now they look the same with gelu in original bert and gelu_new in transformers bert.
Environment info
transformers
version:Who can help
@LysandreJik
Information
Model I am using (Bert, XLNet ...):
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
Expected behavior
a pytorch bert load from google ckpt can get the same outputs with the original tensorflow bert