huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.63k stars 441 forks source link

[BUG] Autotrain Object Detection Error: KeyError: 'autotrain_label' #655

Closed rileybolen closed 1 month ago

rileybolen commented 1 month ago

Prerequisites

Backend

Hugging Face Space/Endpoints

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

Screenshot 2024-05-22 at 8 01 39 AM

Error Logs

Downloading data: 0%| | 0/802 [00:00<?, ?files/s] Downloading data: 100%|██████████| 802/802 [00:00<00:00, 17761.12files/s]

Downloading data: 0%| | 0/203 [00:00<?, ?files/s] Downloading data: 100%|██████████| 203/203 [00:00<00:00, 21729.93files/s]

Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 629 examples [00:00, 6266.56 examples/s] Generating train split: 799 examples [00:00, 6202.62 examples/s]

Generating validation split: 0 examples [00:00, ? examples/s] Generating validation split: 200 examples [00:00, 5944.98 examples/s]

Saving the dataset (0/1 shards): 0%| | 0/799 [00:00<?, ? examples/s] Saving the dataset (0/1 shards): 100%|██████████| 799/799 [00:00<00:00, 5077.60 examples/s] Saving the dataset (1/1 shards): 100%|██████████| 799/799 [00:00<00:00, 5077.60 examples/s] Saving the dataset (1/1 shards): 100%|██████████| 799/799 [00:00<00:00, 5050.56 examples/s]

Saving the dataset (0/1 shards): 0%| | 0/200 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%|██████████| 200/200 [00:00<00:00, 5317.32 examples/s] Saving the dataset (1/1 shards): 100%|██████████| 200/200 [00:00<00:00, 5287.99 examples/s] INFO | 2024-05-22 18:49:36 | autotrain.backends.local:create:8 - Starting local training... INFO | 2024-05-22 18:49:36 | autotrain.commands:launch_command:372 - ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.object_detection', '--training_config', 'autotrain-717ma-3oxi0/training_params.json'] INFO | 2024-05-22 18:49:36 | autotrain.commands:launch_command:373 - {'data_path': 'autotrain-717ma-3oxi0/autotrain-data', 'model': 'facebook/detr-resnet-101', 'username': 'rileybol', 'lr': 5e-05, 'epochs': 3, 'batch_size': 8, 'warmup_ratio': 0.1, 'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'train_split': 'train', 'valid_split': 'validation', 'logging_steps': -1, 'project_name': 'autotrain-717ma-3oxi0', 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'save_total_limit': 1, 'token': '**', 'push_to_hub': True, 'evaluation_strategy': 'epoch', 'image_column': 'autotrain_image', 'objects_column': 'autotrain_label', 'log': 'tensorboard', 'image_square_size': 600, 'early_stopping_patience': 5, 'early_stopping_threshold': 0.01} INFO | 2024-05-22 18:49:36 | autotrain.backends.local:create:13 - Training PID: 154 INFO: 10.16.2.201:24345 - "POST /ui/create_project HTTP/1.1" 200 OK INFO: 10.16.41.118:23391 - "GET /ui/is_model_training HTTP/1.1" 200 OK The following values were not passed to accelerate launch and had defaults used instead: --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. INFO: 10.16.41.118:34820 - "GET /ui/is_model_training HTTP/1.1" 200 OK INFO: 10.16.41.118:24196 - "GET /ui/accelerators HTTP/1.1" 200 OK INFO:matplotlib.font_manager:generated new fontManager INFO: 10.16.15.199:26769 - "GET /ui/is_model_training HTTP/1.1" 200 OK INFO | 2024-05-22 18:49:45 | main:train:83 - Train data: Dataset({ features: ['autotrain_image', 'autotrain_objects'], num_rows: 799 }) INFO | 2024-05-22 18:49:45 | main:train:84 - Valid data: Dataset({ features: ['autotrain_image', 'autotrain_objects'], num_rows: 200 }) ERROR | 2024-05-22 18:49:45 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last): File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper return func(args, kwargs) File "/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/main.py", line 86, in train categories = train_data.features[config.objects_column].feature["category"].names KeyError: 'autotrain_label'

ERROR | 2024-05-22 18:49:45 | autotrain.trainers.common:wrapper:121 - 'autotrain_label'

Additional Information

It seems that my training process gets past the last error, but now I am running into this new error.

abhishekkrthakur commented 1 month ago

i checked the metadata from previous issue. it also seems to be missing area. so it will again fail at some point.

rileybolen commented 1 month ago

i checked the metadata from previous issue. it also seems to be missing area. so it will again fail at some point.

@abhishekkrthakur is this a bug or an issue with my data?

abhishekkrthakur commented 1 month ago

this is indeed, yet again :(, an issue that im looking into atm. but, your metadata.jsonl also seems to be missing "area" inside objects column. we need: area, bbox and category. an example dataset is here: https://huggingface.co/datasets/keremberke/license-plate-object-detection?row=1

rileybolen commented 1 month ago

@abhishekkrthakur Okay, thanks! I will add that to my data. Should the area just be the bounding box width*height? And I noticed that in this example dataset the category column is called category but in the documentation I am looking at it is called categories, which one is correct? https://huggingface.co/docs/autotrain/v0.7.104/object_detection

abhishekkrthakur commented 1 month ago

please follow the format in the link i provided your category names are fine. they should be strings. area needs to be calculated from bboxes. the format should be coco. ill update the docs.

object detection was recently added. i apologize you faced so many issues. ive also fixed the latest one now.

rileybolen commented 1 month ago

@abhishekkrthakur No problem! I realized I am missing the id as well, should that just be a unique integer for each bounding box?

abhishekkrthakur commented 1 month ago

id is not used, you can skip it :)

rileybolen commented 1 month ago

@abhishekkrthakur I noticed a new error, do you think this will be fixed when you merge your new changes or should I open a new issue for this?

Downloading data:   0%|          | 0/802 [00:00<?, ?files/s]
Downloading data: 100%|██████████| 802/802 [00:00<00:00, 438341.39files/s]

Downloading data:   0%|          | 0/203 [00:00<?, ?files/s]
Downloading data: 100%|██████████| 203/203 [00:00<00:00, 24914.96files/s]

Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 574 examples [00:00, 5724.92 examples/s]
Generating train split: 799 examples [00:00, 5898.87 examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]
Generating validation split: 200 examples [00:00, 6323.82 examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/799 [00:00<?, ? examples/s]
Saving the dataset (0/1 shards): 100%|██████████| 799/799 [00:00<00:00, 5256.74 examples/s]
Saving the dataset (1/1 shards): 100%|██████████| 799/799 [00:00<00:00, 5256.74 examples/s]
Saving the dataset (1/1 shards): 100%|██████████| 799/799 [00:00<00:00, 5231.23 examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/200 [00:00<?, ? examples/s]
Saving the dataset (1/1 shards): 100%|██████████| 200/200 [00:00<00:00, 5590.54 examples/s]
Saving the dataset (1/1 shards): 100%|██████████| 200/200 [00:00<00:00, 5556.03 examples/s]
INFO     | 2024-05-22 19:41:53 | autotrain.backends.local:create:8 - Starting local training...
INFO     | 2024-05-22 19:41:53 | autotrain.commands:launch_command:372 - ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.object_detection', '--training_config', 'autotrain-zny6w-3b288/training_params.json']
INFO     | 2024-05-22 19:41:53 | autotrain.commands:launch_command:373 - {'data_path': 'autotrain-zny6w-3b288/autotrain-data', 'model': 'facebook/detr-resnet-101', 'username': 'rileybol', 'lr': 5e-05, 'epochs': 3, 'batch_size': 8, 'warmup_ratio': 0.1, 'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'train_split': 'train', 'valid_split': 'validation', 'logging_steps': -1, 'project_name': 'autotrain-zny6w-3b288', 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'save_total_limit': 1, 'token': '*****', 'push_to_hub': True, 'evaluation_strategy': 'epoch', 'image_column': 'autotrain_image', 'objects_column': 'autotrain_objects', 'log': 'tensorboard', 'image_square_size': 600, 'early_stopping_patience': 5, 'early_stopping_threshold': 0.01}
INFO     | 2024-05-22 19:41:53 | autotrain.backends.local:create:13 - Training PID: 151
INFO:     10.16.41.118:34539 - "POST /ui/create_project HTTP/1.1" 200 OK
INFO:     10.16.41.118:2974 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO:     10.16.15.199:19091 - "GET /ui/accelerators HTTP/1.1" 200 OK
The following values were not passed to `accelerate launch` and had defaults used instead:
    `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
INFO:     10.16.15.199:13209 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO:matplotlib.font_manager:generated new fontManager
INFO     | 2024-05-22 19:42:02 | __main__:train:83 - Train data: Dataset({
    features: ['autotrain_image', 'autotrain_objects'],
    num_rows: 799
})
INFO     | 2024-05-22 19:42:02 | __main__:train:84 - Valid data: Dataset({
    features: ['autotrain_image', 'autotrain_objects'],
    num_rows: 200
})
ERROR    | 2024-05-22 19:42:02 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
    return func(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/object_detection/__main__.py", line 86, in train
    categories = train_data.features[config.objects_column].feature["category"].names
AttributeError: 'dict' object has no attribute 'feature'

ERROR    | 2024-05-22 19:42:02 | autotrain.trainers.common:wrapper:121 - 'dict' object has no attribute 'feature'
abhishekkrthakur commented 1 month ago

can you share few lines from new metadata?

rileybolen commented 1 month ago

@abhishekkrthakur

{"file_name": "S05E01_1185266.jpg", "objects": {"bbox": [[275.7393, 75.5485, 183.911, 180.0954]], "category": ["Face"], "area": [33121.525109400005]}}
{"file_name": "S06E23_1050098.jpg", "objects": {"bbox": [[188.744, 88.5215, 219.7774, 189.2528]], "category": ["Face"], "area": [41593.48832672]}}
{"file_name": "S06E23_315748.jpg", "objects": {"bbox": [[214.69, 237.3291, 225.8824, 181.6216], [472.6232, 281.5898, 161.7806, 123.6248], [57.4881, 194.5946, 136.5978, 154.1494]], "category": ["Face", "Face", "Face"], "area": [41025.12289984, 20000.094318879997, 21056.468911320004]}}
{"file_name": "S06E01_403252.jpg", "objects": {"bbox": [[316.9475, 45.0238, 135.0715, 186.9634], [110.9062, 49.6025, 144.9921, 180.0954]], "category": ["Face", "Face"], "area": [25253.426883099997, 26112.41024634]}}
abhishekkrthakur commented 1 month ago

all the issues are now resolved. you need only whats in the updated docs. area is not needed. ive tested it locally too.

abhishekkrthakur commented 1 month ago

please make sure you are on latest version

rileybolen commented 1 month ago

@abhishekkrthakur Thanks! It looks like this issue is fixed, but I have now run into a different one during the training run, I will open a new issue for it.