nlp-with-transformers / notebooks

Jupyter notebooks for the Natural Language Processing with Transformers book
https://transformersbook.com/
Apache License 2.0
3.85k stars 1.19k forks source link

How to push the model to huggingface model hub? #91

Closed ahmad-alismail closed 1 year ago

ahmad-alismail commented 1 year ago

Information

The problem arises in chapter:

Describe the bug

I'm not able to push the model to the huggingface hub, although I've already logged in to the hugging face hub with Write token.

To Reproduce

Steps to reproduce the behavior:

  1. Open the notebook in Kaggle
  2. Follow the steps provided in the book
  3. Define the training attribute as follows:
    
    from transformers import TrainingArguments

Set the number of epochs, batch size, and logging steps

num_epochs = 3 batch_size = 24 logging_steps = len(panx_de_encoded["train"]) // batch_size

Define the model name

model_name = f"{xlmr_model_name}-finetuned-panx-de"

Define the training arguments for the model

training_args = TrainingArguments( output_dir=model_name, # Directory to save model checkpoints and outputs log_level="error", # Logging level num_train_epochs=num_epochs, # Number of training epochs per_device_train_batch_size=batch_size, # Batch size per device for training per_device_eval_batch_size=batch_size, # Batch size per device for evaluation evaluation_strategy="epoch", # Evaluate model's prediction on the validation set at the end of each epoch save_steps=1e6, # Save checkpoint every 1000000 steps (i.e., disable checkpointing to speed up training) weight_decay=0.01, # Weight decay for optimizer disable_tqdm=False, # Whether to show progress bar during training logging_steps=logging_steps, # Determines the number of steps between each logging message push_to_hub=True # Whether to push the model to the Hugging Face model hub )

5. Log in to the hugging face hub
6. Define the `Trainer` as described in the notebook:
````python
# hide_output
from transformers import Trainer

trainer = Trainer(model_init=model_init,       # A function that instantiates the model to be used
                  args=training_args,          # Arguments to tweak for training
                  data_collator=data_collator, 
                  compute_metrics=compute_metrics,
                  train_dataset=panx_de_encoded["train"],
                  eval_dataset=panx_de_encoded["validation"], 
                  tokenizer=xlmr_tokenizer)

But have the following error:

Cloning https://huggingface.co/ahmad1289/xlm-roberta-base-finetuned-panx-de into local empty directory.
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py in clone_from(self, repo_url, token)
    691                         self.local_dir,
--> 692                         env=env,
    693                     )

/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/_subprocess.py in run_subprocess(command, folder, check, **kwargs)
     68         cwd=folder or os.getcwd(),
---> 69         **kwargs,
     70     )

/opt/conda/lib/python3.7/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    511             raise CalledProcessError(retcode, process.args,
--> 512                                      output=stdout, stderr=stderr)
    513     return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['git', 'lfs', 'clone', 'https://user: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@huggingface.co/ahmad1289/xlm-roberta-base-finetuned-panx-de', '.']' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
/tmp/ipykernel_23/987298996.py in <module>
      8                   train_dataset=panx_de_encoded["train"],
      9                   eval_dataset=panx_de_encoded["validation"],
---> 10                   tokenizer=xlmr_tokenizer)

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in __init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers)
    401         # Create clone of distant repo and output directory if needed
    402         if self.args.push_to_hub:
--> 403             self.init_git_repo()
    404             # In case of pull, we need to make sure every process has the latest.
    405             if is_torch_tpu_available():

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in init_git_repo(self)
   2551                 self.args.output_dir,
   2552                 clone_from=repo_name,
-> 2553                 use_auth_token=use_auth_token,
   2554             )
   2555         except EnvironmentError:

/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py in _inner_fn(*args, **kwargs)
    122             )
    123 
--> 124         return fn(*args, **kwargs)
    125 
    126     return _inner_fn  # type: ignore

/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py in __init__(self, local_dir, clone_from, repo_type, token, git_user, git_email, revision, skip_lfs_files, client)
    516 
    517         if clone_from is not None:
--> 518             self.clone_from(repo_url=clone_from)
    519         else:
    520             if is_git_repo(self.local_dir):

/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py in _inner_fn(*args, **kwargs)
    122             )
    123 
--> 124         return fn(*args, **kwargs)
    125 
    126     return _inner_fn  # type: ignore

/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py in clone_from(self, repo_url, token)
    731 
    732         except subprocess.CalledProcessError as exc:
--> 733             raise EnvironmentError(exc.stderr)
    734 
    735     def git_config_username_and_email(

OSError: WARNING: 'git lfs clone' is deprecated and will not be updated
          with new flags from 'git clone'

'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into '.'...
remote: Repository not found
fatal: repository 'https://huggingface.co/ahmad1289/xlm-roberta-base-finetuned-panx-de/' not found
Error(s) during clone:
git clone failed: exit status 128

Expected behavior

Initialize the trainer, run he training loop and push the final model to the Hub.