huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.24k stars 222 forks source link

Trainer exits with PermissionError if cwd is not writeable #562

Open bluestealth opened 1 month ago

bluestealth commented 1 month ago

v1.1.0

This is similar to #559, I am running setfit in a container and the exec starts in a location that is not writeable but the current user. This results in a PermissionError at runtime.

I am able to replicate this locally using the example even if output_dir is set in TrainingArguments by chowning the execdir to another user.

bad_dir % python3 example.py
Using the latest cached version of the dataset since sst2 couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /Users/bluestealth/.cache/huggingface/datasets/sst2/default/0.0.0/8d51e7e4887a4caaa95b3fbebbf53c0490b58bbb (last modified on Tue Oct  1 18:57:42 2024).
/Users/bluestealth/testing-setfit/.env/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
Applying column mapping to the training dataset
Applying column mapping to the evaluation dataset
Traceback (most recent call last):
  File "/Users/bluestealth/testing-setfit/bad_dir/example.py", line 27, in <module>
    trainer = Trainer(
              ^^^^^^^^
  File "/Users/bluestealth/testing-setfit/.env/lib/python3.12/site-packages/setfit/trainer.py", line 328, in __init__
    self.st_trainer = BCSentenceTransformersTrainer(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bluestealth/testing-setfit/.env/lib/python3.12/site-packages/setfit/trainer.py", line 48, in __init__
    super().__init__(model=setfit_model.model_body, **kwargs)
  File "/Users/bluestealth/testing-setfit/.env/lib/python3.12/site-packages/sentence_transformers/trainer.py", line 201, in __init__
    super().__init__(
  File "/Users/bluestealth/testing-setfit/.env/lib/python3.12/site-packages/transformers/trainer.py", line 611, in __init__
    os.makedirs(self.args.output_dir, exist_ok=True)
  File "<frozen os>", line 225, in makedirs
PermissionError: [Errno 13] Permission denied: 'tmp_trainer'

This is because before settings the arguments passed in super.__init__() is called. Since no TrainingArgs are passed in, it default to output_dir being "tmp_trainer" in the sentence transformer trainer. Then, when sentence transformers calls super.__init__() the transformers trainer tries to create the output_dir causing the error above.