Closed Souravroych closed 3 years ago
It seems that you have no internet access
Thank You. We also came to know that the cluster doesn't have internet access. I can manually download it and put that in a cache folder, if that is possible, can you please suggest where we can put this in a cache folder so that it could access from that place.
You could put it in any folder and point to that folder instead! The from_pretrained
method takes either an identifier to point to the S3 bucket, or a local path containing the required files.
The files must be named correctly, however (pytorch_model.bin
for the PT model, tf_model.h5
for the TF model, and config.json
for the configuration).
I guess the easiest for you would be to do something like the following:
1# Create the model cache
mkdir model_cache
cd model_cache
python
2# Download and save the models to the cache (here are two examples with BERT and RoBERTa)
# When doing this you must be careful that the architectures you're using contain all the trained layers that
# you will need in your task. Using the architectures with which they were pre-trained makes sure to contain
# all of these layers
from transformers import BertForPreTraining, BertTokenizer, RobertaForMaskedLM, RobertaTokenizer
BertForPreTraining.from_pretrained("bert-base-cased").save_pretrained("bert-cache")
BertTokenizer.from_pretrained("bert-base-cased").save_pretrained("bert-cache")
RobertaForMaskedLM.from_pretrained("roberta-base").save_pretrained("roberta-cache")
RobertaTokenizer.from_pretrained("roberta-base").save_pretrained("roberta-cache")
You can check that the folder now contains all the appropriate files:
ls -LR
# Outputs the following
./bert-cache:
config.json pytorch_model.bin special_tokens_map.json tokenizer_config.json vocab.txt
./roberta-cache:
config.json merges.txt pytorch_model.bin special_tokens_map.json tokenizer_config.json vocab.json
You can then move your folder model_cache
to your machine which has no internet access. Hope that helps.
Thanks a lot for the detailed explanation. I followed your steps and saved the checkpoints in model_cache and uncased_l12 (with same contents).However it is showing a keyerrror when it is referencing the model_cache folder
INFO:tensorflow:Extracting pretrained word embeddings weights from BERT 2020-10-30 14:37:43.909781: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 Some layers from the model checkpoint at /users/sroychou/uncased_l12/ were not used when initializing TFBertModel: ['nspcls', 'mlmcls']
Is there something I am doing wrong ? Been stuck on this for sometime.
Hmm well it seems that is an issue with bert_score
? I don't know what is BERT_text_summarisation
, I don't know what is the metrics
script, and I do not know what is the bert_score
package.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Traceback (most recent call last): File "/users/sroychou/BERT_text_summarisation/scripts/train_bert_summarizer.py", line 12, in
from metrics import optimizer, loss_function, label_smoothing, get_loss_and_accuracy, tf_write_summary, monitor_run
File "/users/sroychou/BERT_textsummarisation/scripts/metrics.py", line 16, in
, , = b_score(["I'm Batman"], ["I'm Spiderman"], lang='en', model_type='bert-base-uncased')
File "/users/sroychou/.local/lib/python3.7/site-packages/bert_score/score.py", line 105, in score
tokenizer = AutoTokenizer.from_pretrained(model_type)
File "/users/sroychou/.local/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 298, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, kwargs)
File "/users/sroychou/.local/lib/python3.7/site-packages/transformers/configuration_auto.py", line 330, in from_pretrained
configdict, = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs)
File "/users/sroychou/.local/lib/python3.7/site-packages/transformers/configuration_utils.py", line 382, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 'bert-base-uncased'. Make sure that:
'bert-base-uncased' is a correct model identifier listed on 'https://huggingface.co/models'
or 'bert-base-uncased' is the correct path to a directory containing a config.json file