Open pssnew2pro opened 4 years ago
I am getting a similar error message:
02/13/2020 16:16:53 - INFO - root - model path used /opt/ml/code/pretrained_models/roberta-base 02/13/2020 16:16:53 - INFO - root - finetuned model not available - loading standard pretrained model 02/13/2020 16:16:53 - INFO - transformers.tokenization_utils - Model name '/opt/ml/code/pretrained_models/roberta-base' not found in model shortcut name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). Assuming '/opt/ml/code/pretrained_models/roberta-base' is a path, a model identifier, or url to a directory containing tokenizer files. Exception during training: 'PosixPath' object has no attribute 'decode' Traceback (most recent call last): File "./train", line 138, in train PRETRAINED_PATH, do_lower_case=bool(training_config["do_lower_case"]) File "/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 309, in from_pretrained return cls._from_pretrained(*inputs, **kwargs) File "/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 339, in _from_pretrained if os.path.isfile(pretrained_model_name_or_path) or is_remote_url(pretrained_model_name_or_path): File "/opt/conda/lib/python3.7/site-packages/transformers/file_utils.py", line 143, in is_remote_url parsed = urlparse(url_or_filename) File "/opt/conda/lib/python3.7/urllib/parse.py", line 367, in urlparse url, scheme, _coerce_result = _coerce_args(url, scheme) File "/opt/conda/lib/python3.7/urllib/parse.py", line 123, in _coerce_args return _decode_args(args) + (_encode_result,) File "/opt/conda/lib/python3.7/urllib/parse.py", line 107, in _decode_args return tuple(x.decode(encoding, errors) if x else '' for x in args) File "/opt/conda/lib/python3.7/urllib/parse.py", line 107, in <genexpr> return tuple(x.decode(encoding, errors) if x else '' for x in args) AttributeError: 'PosixPath' object has no attribute 'decode'
Any help would be appreciated
I managed to run the example by changing line 114 of the train file in container/bert from:
PRETRAINED_PATH = Path(pretrained_model_path) / training_config["model_name"]
to:
PRETRAINED_PATH = pretrained_model_path + '/' + str(training_config["model_name"])
seems like urlparse doesn't like being sent a PosixPath but can deal with a String.
I tried following this tutorial. https://medium.com/@kaushaltrivedi/train-and-deploy-mighty-transformer-nlp-models-using-fastbert-and-aws-sagemaker-cc4303c51cf3
The training fails, please see below for the entire stack trace:
2020-02-11 01:09:23 Starting - Starting the training job... 2020-02-11 01:09:25 Starting - Launching requested ML instances...... 2020-02-11 01:10:27 Starting - Preparing the instances for training...... 2020-02-11 01:11:41 Downloading - Downloading input data... 2020-02-11 01:12:12 Training - Downloading the training image............... 2020-02-11 01:14:53 Training - Training image download completed. Training in progress...Starting the training. /opt/ml/input/data/training/config/training_config.json {'run_text': 'ALL_CMNT_CLEAN', 'finetuned_model': None, 'do_lower_case': 'True', 'train_file': 'train.csv', 'val_file': 'test.csv', 'label_file': 'labels.csv', 'text_col': 'ALL_CMNT_CLEAN', 'label_col': 'RPR_ACTN_DS', 'multi_label': 'True', 'grad_accumulation_steps': '1', 'fp16_opt_level': 'O1', 'fp16': 'True', 'model_type': 'roberta', 'model_name': 'roberta-base', 'logging_steps': '300'} {'train_batch_size': '16', 'warmup_steps': '1000', 'lr': '8e-05', 'max_seq_length': '512', 'optimizer_type': 'adamw', 'lr_schedule': 'warmup_cosine', 'epochs': '10'} 02/11/2020 01:14:56 - INFO - root - model path used /opt/ml/code/pretrained_models/roberta-base 02/11/2020 01:14:56 - INFO - root - finetuned model not available - loading standard pretrained model 02/11/2020 01:14:56 - INFO - transformers.tokenization_utils - Model name '/opt/ml/code/pretrained_models/roberta-base' not found in model shortcut name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). Assuming '/opt/ml/code/pretrained_models/roberta-base' is a path, a model identifier, or url to a directory containing tokenizer files. Exception during training: 'PosixPath' object has no attribute 'decode' Traceback (most recent call last): File "/opt/ml/code/train", line 138, in train PRETRAINED_PATH, do_lower_case=bool(training_config["do_lower_case"]) File "/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 309, in from_pretrained return cls._from_pretrained(*inputs, **kwargs) File "/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 339, in _from_pretrained if os.path.isfile(pretrained_model_name_or_path) or is_remote_url(pretrained_model_name_or_path): File "/opt/conda/lib/python3.7/site-packages/transformers/file_utils.py", line 143, in is_remote_url parsed = urlparse(url_or_filename) File "/opt/conda/lib/python3.7/urllib/parse.py", line 367, in urlparse url, scheme, _coerce_result = _coerce_args(url, scheme) File "/opt/conda/lib/python3.7/urllib/parse.py", line 123, in _coerce_args return _decode_args(args) + (_encode_result,) File "/opt/conda/lib/python3.7/urllib/parse.py", line 107, in _decode_args return tuple(x.decode(encoding, errors) if x else '' for x in args) File "/opt/conda/lib/python3.7/urllib/parse.py", line 107, in
return tuple(x.decode(encoding, errors) if x else '' for x in args)
AttributeError: 'PosixPath' object has no attribute 'decode'
2020-02-11 01:15:05 Uploading - Uploading generated training model 2020-02-11 01:15:05 Failed - Training job failed
UnexpectedStatusException Traceback (most recent call last)