huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.29k stars 26.85k forks source link

How can I create a repository automatically when defining the `Trainer`? #22134

Closed ahmad-alismail closed 1 year ago

ahmad-alismail commented 1 year ago

Describe the bug

I'm trying to fine-tune XLM-RoBERTa model on a German corpus for NER task. To handle the training loop I'm using the 🤗 Transformers Trainer, so first I need to define the training attributes using the TrainingArguments class:

from transformers import TrainingArguments

# Set the number of epochs, batch size, and logging steps
num_epochs = 3
batch_size = 24
logging_steps = len(panx_de_encoded["train"]) // batch_size

# Define the model name
model_name = f"{xlmr_model_name}-finetuned-panx-de"

# Define the training arguments for the model
training_args = TrainingArguments(
    output_dir=model_name,                   # Directory to save model checkpoints and outputs
    log_level="error",                       # Logging level
    num_train_epochs=num_epochs,             # Number of training epochs
    per_device_train_batch_size=batch_size,  # Batch size per device for training
    per_device_eval_batch_size=batch_size,   # Batch size per device for evaluation
    evaluation_strategy="epoch",             # Evaluate model's prediction on the validation set at the end of each epoch
    save_steps=1e6,                          # Save checkpoint every 1000000 steps (i.e., disable checkpointing to speed up training)
    weight_decay=0.01,                       # Weight decay for optimizer
    disable_tqdm=False,                      # Whether to show progress bar during training
    logging_steps=logging_steps,             # Determines the number of steps between each logging message
    push_to_hub=True                         # Whether to push the model to the Hugging Face model hub
)

trainer = Trainer(model_init=model_init, # A function that instantiates the model to be used args=training_args, # Arguments to tweak for training data_collator=data_collator, compute_metrics=compute_metrics, train_dataset=panx_de_encoded["train"], eval_dataset=panx_de_encoded["validation"], tokenizer=xlmr_tokenizer)

Unfortunately, I have the following error: 
````python
Cloning https://huggingface.co/ahmad1289/xlm-roberta-base-finetuned-panx-de into local empty directory.
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py in clone_from(self, repo_url, token)
    691                         self.local_dir,
--> 692                         env=env,
    693                     )

/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/_subprocess.py in run_subprocess(command, folder, check, **kwargs)
     68         cwd=folder or os.getcwd(),
---> 69         **kwargs,
     70     )

/opt/conda/lib/python3.7/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    511             raise CalledProcessError(retcode, process.args,
--> 512                                      output=stdout, stderr=stderr)
    513     return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['git', 'lfs', 'clone', 'https://user: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@huggingface.co/ahmad1289/xlm-roberta-base-finetuned-panx-de', '.']' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
/tmp/ipykernel_23/987298996.py in <module>
      8                   train_dataset=panx_de_encoded["train"],
      9                   eval_dataset=panx_de_encoded["validation"],
---> 10                   tokenizer=xlmr_tokenizer)

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in __init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers)
    401         # Create clone of distant repo and output directory if needed
    402         if self.args.push_to_hub:
--> 403             self.init_git_repo()
    404             # In case of pull, we need to make sure every process has the latest.
    405             if is_torch_tpu_available():

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in init_git_repo(self)
   2551                 self.args.output_dir,
   2552                 clone_from=repo_name,
-> 2553                 use_auth_token=use_auth_token,
   2554             )
   2555         except EnvironmentError:

/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py in _inner_fn(*args, **kwargs)
    122             )
    123 
--> 124         return fn(*args, **kwargs)
    125 
    126     return _inner_fn  # type: ignore

/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py in __init__(self, local_dir, clone_from, repo_type, token, git_user, git_email, revision, skip_lfs_files, client)
    516 
    517         if clone_from is not None:
--> 518             self.clone_from(repo_url=clone_from)
    519         else:
    520             if is_git_repo(self.local_dir):

/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/_validators.py in _inner_fn(*args, **kwargs)
    122             )
    123 
--> 124         return fn(*args, **kwargs)
    125 
    126     return _inner_fn  # type: ignore

/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py in clone_from(self, repo_url, token)
    731 
    732         except subprocess.CalledProcessError as exc:
--> 733             raise EnvironmentError(exc.stderr)
    734 
    735     def git_config_username_and_email(

OSError: WARNING: 'git lfs clone' is deprecated and will not be updated
          with new flags from 'git clone'

'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into '.'...
remote: Repository not found
fatal: repository 'https://huggingface.co/ahmad1289/xlm-roberta-base-finetuned-panx-de/' not found
Error(s) during clone:
git clone failed: exit status 128

It appears that the model repository with the name xlm-roberta-base-finetuned-panx-de does not currently exist. However, as described in the Hugging Face course, the push_to_hub() function (which should be used later in the notebook) handles both the creation of the repository and the push of the model and tokenizer files to that repository.

Is there anything else that I might be missing?

System info

- huggingface_hub version: 0.12.1
- Platform: Linux-5.15.89+-x86_64-with-debian-bullseye-sid
- Python version: 3.7.12
- Running in iPython ?: Yes
- iPython shell: ZMQInteractiveShell
- Running in notebook ?: Yes
- Running in Google Colab ?: No
- Token path ?: /root/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: 
- FastAI: 2.7.11
- Tensorflow: 2.11.0
- Torch: 1.13.0
- Jinja2: 3.1.2
- Graphviz: 0.8.4
- Pydot: 1.4.2
- Pillow: 9.3.0
- hf_transfer: N/A
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /root/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /root/.cache/huggingface/assets
- HF_HUB_OFFLINE: False
- HF_TOKEN_PATH: /root/.cache/huggingface/token
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False

{'huggingface_hub version': '0.12.1',
 'Platform': 'Linux-5.15.89+-x86_64-with-debian-bullseye-sid',
 'Python version': '3.7.12',
 'Running in iPython ?': 'Yes',
 'iPython shell': 'ZMQInteractiveShell',
 'Running in notebook ?': 'Yes',
 'Running in Google Colab ?': 'No',
 'Token path ?': PosixPath('/root/.cache/huggingface/token'),
 'Has saved token ?': False,
 'Configured git credential helpers': '',
 'FastAI': '2.7.11',
 'Tensorflow': '2.11.0',
 'Torch': '1.13.0',
 'Jinja2': '3.1.2',
 'Graphviz': '0.8.4',
 'Pydot': '1.4.2',
 'Pillow': '9.3.0',
 'hf_transfer': 'N/A',
 'ENDPOINT': 'https://huggingface.co',
 'HUGGINGFACE_HUB_CACHE': '/root/.cache/huggingface/hub',
 'HUGGINGFACE_ASSETS_CACHE': '/root/.cache/huggingface/assets',
 'HF_HUB_OFFLINE': False,
 'HF_TOKEN_PATH': '/root/.cache/huggingface/token',
 'HF_HUB_DISABLE_PROGRESS_BARS': None,
 'HF_HUB_DISABLE_SYMLINKS_WARNING': False,
 'HF_HUB_DISABLE_IMPLICIT_TOKEN': False,
 'HF_HUB_ENABLE_HF_TRANSFER': False}
Wauplin commented 1 year ago

Hi @ahmad-alismail , thanks for reporting this.

~If you don't mind I'll transfer this issue to the transformers repo and rename it. A breaking change has being introduced in huggingface_hub==0.12.0. Since then, Repository do not handle the repo creation if not existing on the Hub.~

~It seems that the Trainer push_to_hub method do not handle the repo creation before calling Repository which now fails. This has to be fixed here. In the meantime, you need to manually create the repo before using Trainer.push_to_hub or downgrade to huggingface_hub==0.11.1.~

~@sgugger @ydshieh I'll open a PR today to fix this.~

EDIT: I cannot transfer the issue to transformers (most likely because I'm not a maintainer there) so if someone can do it :pray:

EDIT 2: it seems that the repo creation is already handled in the Trainer class. @sgugger @ydshieh an idea why the create_repo was not called?

Wauplin commented 1 year ago

@ahmad-alismail which version of transformers do you have?

ydshieh commented 1 year ago

Yeah, looks the number line of the error in the PR description has a difference of > 1000. Better to know which transformers version is used here.

ahmad-alismail commented 1 year ago

Hi @Wauplin @ydshieh, thanks for your reply! The version of transformers is 4.11.3

Wauplin commented 1 year ago

@ahmad-alismail Could you try to update the transformers package to latest release (4.26.1) and re-run your script? Version 4.11.3 was released in September 2021 and is therefore outdated.

ahmad-alismail commented 1 year ago

@Wauplin It's working perfectly! I truly appreciate your help – thank you so much!