huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.63k stars 441 forks source link

[BUG] Config name is missing for Datasets with no default config #644

Closed Aryan-401 closed 1 month ago

Aryan-401 commented 1 month ago

Prerequisites

Backend

Other cloud providers

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

image

Error Logs

ERROR | 2024-05-16 09:52:25 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/autotrain/trainers/object_detection/main.py", line 49, in train train_data = load_dataset( File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 2587, in load_dataset builder_instance = load_dataset_builder( File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 2296, in load_dataset_builder builder_instance: DatasetBuilder = builder_cls( File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 374, in init self.config, self.config_id = self._create_builder_config( File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 584, in _create_builder_config raise ValueError( ValueError: Config name is missing. Please pick one among the available configs: ['full', 'mini'] Example of usage: load_dataset('license-plate-object-detection', 'full')

ERROR | 2024-05-16 09:52:25 | autotrain.trainers.common:wrapper:121 - Config name is missing. Please pick one among the available configs: ['full', 'mini'] Example of usage: load_dataset('license-plate-object-detection', 'full')

Additional Information

Seems like the Problem is due to Non-Default configs in the dataset I'm trying to use. Next logical question is this by design? If yes, how can I use datasets without any default to autotrain?

Links to support Theory:

abhishekkrthakur commented 1 month ago

it was by design but ive added a workaround. for the dataset you mention, you can use config_name:split_name as split.

for example: full:train in train split and full:test or full:validation in valid split

make sure you are on version 0.7.99 or above.