huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.63k stars 441 forks source link

[Push to Hub fails for local data paths] #648

Closed haywoodsloan closed 1 month ago

haywoodsloan commented 1 month ago

Prerequisites

Backend

Local

Interface Used

CLI

CLI Command

autotrain --config /var/hf/config/google/vit-large-patch16-224.yml

UI Screenshots & Parameters

task: image-classification
base_model: google/vit-large-patch16-224
project_name: autotrain-ai-image-detect
log: tensorboard
backend: local

data:
  path: /var/hf/images
  train_split: train
  valid_split: test
  column_mapping:
    image_column: image
    target_column: label

params:
  lr: 0.00005
  epochs: 10
  batch_size: 8
  warmup_ratio: 0.1
  gradient_accumulation: 1
  optimizer: adamw_torch
  scheduler: linear
  weight_decay: 0
  max_grad_norm: 1
  seed: 42
  logging_steps: -1
  auto_find_batch_size: false
  mixed_precision: fp16
  save_total_limit: 1
  evaluation_strategy: epoch
  early_stopping_patience: 5
  early_stopping_threshold: 0.01

hub:
  username: ${HF_USERNAME}
  token: ${HF_TOKEN}
  push_to_hub: true

Error Logs

train has failed due to an exception: Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/app/env/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/validate-yaml

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3761, in create_commit
    hf_raise_for_status(response)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 358, in hf_raise_for_status
    raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError:  (Request ID: Root=1-664a4963-5f30158c220e06ea4643c70e;3e0063dc-b80f-44aa-911a-0836a334f510)

Bad request:
"datasets[0]" with value "/var/hf/images/" is not valid. If possible, use a dataset id from https://hf.co/datasets.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
    return func(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/image_classification/__main__.py", line 208, in train
    api.upload_folder(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1230, in _inner
    return fn(self, *args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 4807, in upload_folder
    commit_info = self.create_commit(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1230, in _inner
    return fn(self, *args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3765, in create_commit
    raise ValueError(f"Invalid metadata in README.md.\n{message}") from e
ValueError: Invalid metadata in README.md.
- "datasets[0]" with value "/var/hf/images/" is not valid. If possible, use a dataset id from https://hf.co/datasets.

Additional Information

The dataset name used in the new model's README.md is just the data_path, so a local directory will cause an invalid dataset name error when uploading the model.

For example, the model card's dataset name for image classification training is set here: https://github.com/huggingface/autotrain-advanced/blob/9c2c7b56eb2704ac16f4923d723b89b7c5364238/src/autotrain/trainers/image_classification/utils.py#L133-L136

It would be nice if this value was omitted for local data paths. An even better solution could be to add an additional config param to specify the Hugging Face dataset name in addition to the data path. This would enable scenarios where the dataset is checked out locally, but the model will still link to the correct dataset when published.

Thank you!

abhishekkrthakur commented 1 month ago

unfortunately, i was not able to reproduce it the way you described. however, ive added a check and it wont push dataset tag when using a local dataset. please upgrade to 0.7.98+ & let me know if you still face this issue.

haywoodsloan commented 1 month ago

@abhishekkrthakur Unfortunately, this issues is still present in version 0.7.101. I'm still receiving the same error using the same config and CLI command. I think the issue is related to when the data path is a absolute path like /var/hf/images

abhishekkrthakur commented 1 month ago

i tried the same with absolute path too. didnt receive any error. from the code, your problem is visibly resolved. could you please confirm? also, can you provide full logs?

haywoodsloan commented 1 month ago

Yes, I just received the following error when using the previously provided config and CLI command. I confirmed I was using version 0.7.101 with autotrain --version.

train has failed due to an exception: Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/app/env/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/validate-yaml

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3761, in create_commit
    hf_raise_for_status(response)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 358, in hf_raise_for_status
    raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError:  (Request ID: Root=1-664a4963-5f30158c220e06ea4643c70e;3e0063dc-b80f-44aa-911a-0836a334f510)

Bad request:
"datasets[0]" with value "/var/hf/images/" is not valid. If possible, use a dataset id from https://hf.co/datasets.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
    return func(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/image_classification/__main__.py", line 208, in train
    api.upload_folder(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1230, in _inner
    return fn(self, *args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 4807, in upload_folder
    commit_info = self.create_commit(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1230, in _inner
    return fn(self, *args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3765, in create_commit
    raise ValueError(f"Invalid metadata in README.md.\n{message}") from e
ValueError: Invalid metadata in README.md.
- "datasets[0]" with value "/var/hf/images/" is not valid. If possible, use a dataset id from https://hf.co/datasets.
abhishekkrthakur commented 1 month ago

in the output folder, you must have README.md. could you please copy paste its contents?

haywoodsloan commented 1 month ago

Here's the README.md from the model output folder:

---
tags:
- autotrain
- image-classification
widget:
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
  example_title: Tiger
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg
  example_title: Teapot
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg
  example_title: Palace
datasets:
- /var/hf/images
---

# Model Trained Using AutoTrain

- Problem type: Image Classification

## Validation Metrics
loss: 0.7285889387130737

f1: 0.5

precision: 0.6666666666666666

recall: 0.4

auc: 0.52

accuracy: 0.6

I've also attached the full log files. May22_17-11-20_c2798dce683b.zip

abhishekkrthakur commented 1 month ago

I just ran this config:

task: image-classification
base_model: google/vit-base-patch16-224
project_name: autotrain-ai-image-detect
log: tensorboard
backend: local

data:
  path: /Users/abhishek/Downloads/Datasets/image_classification/flowers
  train_split: train
  valid_split: null
  column_mapping:
    image_column: image
    target_column: label

params:
  lr: 0.00005
  epochs: 1
  batch_size: 8
  warmup_ratio: 0.1
  gradient_accumulation: 1
  optimizer: adamw_torch
  scheduler: linear
  weight_decay: 0
  max_grad_norm: 1
  seed: 42
  logging_steps: -1
  auto_find_batch_size: false
  mixed_precision: none
  save_total_limit: 1
  evaluation_strategy: epoch
  early_stopping_patience: 5
  early_stopping_threshold: 0.01

hub:
  username: ${HF_USERNAME}
  token: ${HF_TOKEN}
  push_to_hub: true

with the command:

autotrain --config /Users/abhishek/Downloads/Datasets/config.yml

from /Users/abhishek

and it worked successfully and my model was pushed to hub.

The readme contents didnt contain dataset tag:

---
tags:
- autotrain
- image-classification
widget:
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
  example_title: Tiger
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg
  example_title: Teapot
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg
  example_title: Palace
---

# Model Trained Using AutoTrain

- Problem type: Image Classification

## Validation Metrics
loss: 0.046192716807127

f1_macro: 0.9831967159545663

f1_micro: 0.9833948339483395

f1_weighted: 0.9833459803821667

precision_macro: 0.9842701698279861

precision_micro: 0.9833948339483395

precision_weighted: 0.9835024125781294

recall_macro: 0.9823230808554145

recall_micro: 0.9833948339483395

recall_weighted: 0.9833948339483395

accuracy: 0.9833948339483395

It seems like you have some version conflict. do you mind installing autotrain in a new environment and try?

haywoodsloan commented 1 month ago

I just ran that same config and got this error:

train has failed due to an exception: Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/app/env/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/validate-yaml

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3668, in create_commit
    hf_raise_for_status(response)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 358, in hf_raise_for_status
    raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError:  (Request ID: Root=1-664e3758-2faaa1f35dcca87b0c0b2c90;11bae101-997f-41e3-95e9-46b4d229763b)

Bad request:
"datasets[0]" with value "/Users/abhishek/Downloads/Datasets/image_classification/flowers" is not valid. If possible, use a dataset id from https://hf.co/datasets.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
    return func(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/image_classification/__main__.py", line 226, in train
    api.upload_folder(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1286, in _inner
    return fn(self, *args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 4724, in upload_folder
    commit_info = self.create_commit(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1286, in _inner
    return fn(self, *args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3672, in create_commit
    raise ValueError(f"Invalid metadata in README.md.\n{message}") from e
ValueError: Invalid metadata in README.md.
- "datasets[0]" with value "/Users/abhishek/Downloads/Datasets/image_classification/flowers" is not valid. If possible, use a dataset id from https://hf.co/datasets.

The README.md content for the output model is:

---
tags:
- autotrain
- image-classification
widget:
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
  example_title: Tiger
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg
  example_title: Teapot
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg
  example_title: Palace
datasets:
- /Users/abhishek/Downloads/Datasets/image_classification/flowers
---

# Model Trained Using AutoTrain

- Problem type: Image Classification

## Validation Metrics
No validation metrics available

I'm running autotrain in a docker container built from this Dockerfile:

FROM huggingface/autotrain-advanced:latest
RUN pip uninstall -y autotrain-advanced
RUN pip install -U autotrain-advanced
CMD export HF_USERNAME=$(cat $HF_USER_FILE) && \
  export HF_TOKEN=$(cat $HF_TOKEN_FILE) && \
  bash

When I run which autotrain I get this: /app/env/bin/autotrain. And the current version is now 0.7.104.

I've deleted and rebuilt the container but get the same error.

abhishekkrthakur commented 1 month ago

thanks. hopefully fixed in 0.7.106+ by adding one more check around dataset tag. latest image is currently building: https://github.com/huggingface/autotrain-advanced/actions/runs/9196666390/job/25295280759