training_args = TrainingArguments(
output_dir=model_name, # Directory to save model checkpoints and outputs
log_level="error", # Logging level
num_train_epochs=num_epochs, # Number of training epochs
per_device_train_batch_size=batch_size, # Batch size per device for training
per_device_eval_batch_size=batch_size, # Batch size per device for evaluation
evaluation_strategy="epoch", # Evaluate model's prediction on the validation set at the end of each epoch
save_steps=1e6, # Save checkpoint every 1000000 steps (i.e., disable checkpointing to speed up training)
weight_decay=0.01, # Weight decay for optimizer
disable_tqdm=False, # Whether to show progress bar during training
logging_steps=logging_steps, # Determines the number of steps between each logging message
push_to_hub=True # Whether to push the model to the Hugging Face model hub
)
5. Log in to the hugging face hub
6. Define the `Trainer` as described in the notebook:
````python
# hide_output
from transformers import Trainer
trainer = Trainer(model_init=model_init, # A function that instantiates the model to be used
args=training_args, # Arguments to tweak for training
data_collator=data_collator,
compute_metrics=compute_metrics,
train_dataset=panx_de_encoded["train"],
eval_dataset=panx_de_encoded["validation"],
tokenizer=xlmr_tokenizer)
But have the following error:
Cloning https://huggingface.co/ahmad1289/xlm-roberta-base-finetuned-panx-de into local empty directory.
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py in clone_from(self, repo_url, token)
691 self.local_dir,
--> 692 env=env,
693 )
/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/_subprocess.py in run_subprocess(command, folder, check, **kwargs)
68 cwd=folder or os.getcwd(),
---> 69 **kwargs,
70 )
/opt/conda/lib/python3.7/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
511 raise CalledProcessError(retcode, process.args,
--> 512 output=stdout, stderr=stderr)
513 return CompletedProcess(process.args, retcode, stdout, stderr)
CalledProcessError: Command '['git', 'lfs', 'clone', 'https://user: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@huggingface.co/ahmad1289/xlm-roberta-base-finetuned-panx-de', '.']' returned non-zero exit status 2.
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
/tmp/ipykernel_23/987298996.py in <module>
8 train_dataset=panx_de_encoded["train"],
9 eval_dataset=panx_de_encoded["validation"],
---> 10 tokenizer=xlmr_tokenizer)
/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in __init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers)
401 # Create clone of distant repo and output directory if needed
402 if self.args.push_to_hub:
--> 403 self.init_git_repo()
404 # In case of pull, we need to make sure every process has the latest.
405 if is_torch_tpu_available():
/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in init_git_repo(self)
2551 self.args.output_dir,
2552 clone_from=repo_name,
-> 2553 use_auth_token=use_auth_token,
2554 )
2555 except EnvironmentError:
/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py in _inner_fn(*args, **kwargs)
122 )
123
--> 124 return fn(*args, **kwargs)
125
126 return _inner_fn # type: ignore
/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py in __init__(self, local_dir, clone_from, repo_type, token, git_user, git_email, revision, skip_lfs_files, client)
516
517 if clone_from is not None:
--> 518 self.clone_from(repo_url=clone_from)
519 else:
520 if is_git_repo(self.local_dir):
/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py in _inner_fn(*args, **kwargs)
122 )
123
--> 124 return fn(*args, **kwargs)
125
126 return _inner_fn # type: ignore
/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py in clone_from(self, repo_url, token)
731
732 except subprocess.CalledProcessError as exc:
--> 733 raise EnvironmentError(exc.stderr)
734
735 def git_config_username_and_email(
OSError: WARNING: 'git lfs clone' is deprecated and will not be updated
with new flags from 'git clone'
'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into '.'...
remote: Repository not found
fatal: repository 'https://huggingface.co/ahmad1289/xlm-roberta-base-finetuned-panx-de/' not found
Error(s) during clone:
git clone failed: exit status 128
Expected behavior
Initialize the trainer, run he training loop and push the final model to the Hub.
Information
The problem arises in chapter:
Describe the bug
I'm not able to push the model to the huggingface hub, although I've already logged in to the hugging face hub with
Write
token.To Reproduce
Steps to reproduce the behavior:
Set the number of epochs, batch size, and logging steps
num_epochs = 3 batch_size = 24 logging_steps = len(panx_de_encoded["train"]) // batch_size
Define the model name
model_name = f"{xlmr_model_name}-finetuned-panx-de"
Define the training arguments for the model
training_args = TrainingArguments( output_dir=model_name, # Directory to save model checkpoints and outputs log_level="error", # Logging level num_train_epochs=num_epochs, # Number of training epochs per_device_train_batch_size=batch_size, # Batch size per device for training per_device_eval_batch_size=batch_size, # Batch size per device for evaluation evaluation_strategy="epoch", # Evaluate model's prediction on the validation set at the end of each epoch save_steps=1e6, # Save checkpoint every 1000000 steps (i.e., disable checkpointing to speed up training) weight_decay=0.01, # Weight decay for optimizer disable_tqdm=False, # Whether to show progress bar during training logging_steps=logging_steps, # Determines the number of steps between each logging message push_to_hub=True # Whether to push the model to the Hugging Face model hub )
But have the following error:
Expected behavior
Initialize the trainer, run he training loop and push the final model to the Hub.