ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.91k stars 16.39k forks source link

KeyError occur when start training #958

Closed Zzh-tju closed 3 years ago

Zzh-tju commented 4 years ago

❔Question

@glenn-jocher Currently, I work on a face detection. I use the following command to train. python train.py --img 640 --batch 16 --epochs 5 --data ./data/face.yaml --cfg ./models/yolov5s.yaml --weights yolov5s.pt All the training datasets are

../face/images/train/6000.jpg
../face/images/train/6001.jpg
../face/images/train/6002.jpg
......

And their coresponding labels are

../face/labels/train/6000.txt
../face/labels/train/6001.txt
../face/labels/train/6002.txt
......

But I have an error:

WARNING: /media/zzh/face/images/train/9994.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9995.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9996.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9997.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9998.jpg: setting an array element with a sequence.
WARNING: /media/zzh/face/images/train/9999.jpg: setting an array element with a sequence.
Scanning images: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6120/6120 [00:01<00:00, 4498.79it/s]
Traceback (most recent call last):
  File "train.py", line 456, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 169, in train
    world_size=opt.world_size, workers=opt.workers)
  File "/media/zzh/yolov5/utils/datasets.py", line 61, in create_dataloader
    rank=rank)
  File "/media/zzh/yolov5/utils/datasets.py", line 380, in __init__
    labels, shapes = zip(*[cache[x] for x in self.label_files])
  File "/media/zzh/yolov5/utils/datasets.py", line 380, in <listcomp>
    labels, shapes = zip(*[cache[x] for x in self.label_files])
KeyError: '/media/zzh/face/labels/train/10000.txt'

And face/labels/train/10000.txt is 0 0.6062500000000001 0.6017543859649123 0.3775 0.5719298245614035

I don't know how can I solve this problem.

github-actions[bot] commented 4 years ago

Hello @Zzh-tju, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

Zzh-tju commented 4 years ago

When I changed labels, shapes = zip(*[cache[x] for x in self.label_files]) back to labels, shapes = zip(*[cache[x] for x in self.img_files]) in utils/datasets.py Error changed too:

20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]
 22          [-1, 10]  1         0  models.common.Concat                    [1]
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]
 24      [17, 20, 23]  1     18879  models.yolo.Detect                      [2, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.25779e+06 parameters, 7.25779e+06 gradients

Transferred 362/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
TypeError: float() argument must be a string or a number, not 'tuple'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 456, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 169, in train
    world_size=opt.world_size, workers=opt.workers)
  File "/media/zzh/yolov5/utils/datasets.py", line 61, in create_dataloader
    rank=rank)
  File "/media/zzh/yolov5/utils/datasets.py", line 383, in __init__
    self.shapes = np.array(shapes, dtype=np.float64)
ValueError: setting an array element with a sequence.
karen-gishyan commented 4 years ago

Same issue @Zzh-tju, but the issue is new, had no problems until yesterday.

glenn-jocher commented 4 years ago

@Zzh-tju @karen-gishyan Hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

karen-gishyan commented 4 years ago

Hello @glenn-jocher , and thanks for your reply. I could see that the issue was with the way the labels were being read. I looked at your commit history in utils/datasets.py, and went back to your previous version, which solved the problem. self.label_files = [x.replace('images', 'labels').replace(os.path.splitext(x)[-1], '.txt') for x in self.img_files] I think the new change may certainly be the source of the issue. Thanks.

sophiatmu commented 4 years ago

@karen-gishyan I got the same problem as you, but it came out with "WARNING: /home/TrafficLight/JPEGImages/10141_0_1.jpg: image size <10 pixels" before, and which previous version were you use? thanks for your reply.

lolpa1n commented 4 years ago

@karen-gishyan same problem, how do u solve? fixes in datasets.py dont solved the problem

Optimizer groups: 86 .bias, 94 conv.weight, 83 other
TypeError: float() argument must be a string or a number, not 'tuple'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 456, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 169, in train
    rank=rank, world_size=opt.world_size, workers=opt.workers)
  File "/content/yolov5/utils/datasets.py", line 61, in create_dataloader
    rank=rank)
  File "/content/yolov5/utils/datasets.py", line 379, in __init__
    self.shapes = np.array(shapes, dtype=np.float64)
ValueError: setting an array element with a sequence.
karen-gishyan commented 4 years ago

@sophiatmu I know this a temporary solution until the authors take a look at it, but in the utils/datasets.py, changed the code in lines 366,3 67 to the following code, which is the previous commit, and the model worked again.

        self.label_files = [x.replace('images', 'labels').replace(os.path.splitext(x)[-1], '.txt') for x in
                            self.img_files]
glenn-jocher commented 4 years ago

@Zzh-tju @karen-gishyan @lolpa1n @sophiatmu I've pushed a fix which should restore similar functionality to before. https://github.com/ultralytics/yolov5/blob/806e75f2b1166a4a789e0ea70b0e48064005f5c9/utils/datasets.py#L365-L368

Note that label paths are defined as the image paths with a .replace() statement that will replace the last instance of /images/ with /labels/ in your image paths.

glenn-jocher commented 4 years ago

The dataset structure example provided by @Zzh-tju should work with no issues:

../face/images/train/6000.jpg
../face/images/train/6001.jpg
../face/images/train/6002.jpg
......

And their coresponding labels are

../face/labels/train/6000.txt
../face/labels/train/6001.txt
../face/labels/train/6002.txt
......

CI tests on fix https://github.com/ultralytics/yolov5/commit/806e75f2b1166a4a789e0ea70b0e48064005f5c9 are all green. https://github.com/ultralytics/yolov5/actions/runs/252688252

Zzh-tju commented 4 years ago

OK, fixed.

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

21143 commented 4 years ago

Hi. I am facing the same issue when training on my custom dataset in ubuntu 18.04. However, this issue does not come up with coco128 dataset. On windows 10, I do not face this issue at all with either my custom dataset or coco128. Any thoughts on why this could be happening and where I should be looking to fix this?

glenn-jocher commented 4 years ago

@21143 it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

21143 commented 4 years ago

Update: Fixed my issue by deleting the train.cache and val.cache files in the labels folder and re-running the training. I'm able to run training code now. Thanks !

Radleye commented 3 years ago

train in my windows is ok , but when i upload to server gpu to train is occur error, i fix it by delete the label.cache in data folder

xqneko commented 3 years ago

I just found another thing that can cause this error: blank lines in the label file. So when I made labels for yolov5 training, I printed the labels in a wrong way, which made unintended blank lines between each object in a label file. Then I removed the blank lines of the label files, and the training works normally again.

It makes sense because this error happened while caching labels not images.

github-actions[bot] commented 3 years ago

πŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 πŸš€ resources:

Access additional Ultralytics ⚑ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 πŸš€ and Vision AI ⭐!

Farhad2590 commented 1 year ago

Screenshot 2023-07-28 190021

how can i solve this problem?

glenn-jocher commented 1 year ago

@Farhad2590 hi there! It seems like you are running into an issue with the YOLOv5 training process. To help you out, could you please provide some more details about the specific problem you are facing? Specifically, any error messages or stack traces that you are encountering would be helpful in diagnosing the issue.

Additionally, please share the command or code that you are using for training, as well as any relevant information about your dataset and environment. With this information, we can better understand the problem and provide you with an appropriate solution.

Looking forward to assisting you further!

Farhad2590 commented 1 year ago

I am trying to yolov7 model %cd /content/drive/MyDrive/yolov7 !python train.py --workers 1 --device 0 --batch-size 4 --epochs 10 --img 640 640 --hyp data/hyp.scratch.custom.yaml --name yolov7-custom --weights yolov7.pt herer is the code , I couldn't find any error message just showing this lines "Transferred 554/560 items from yolov7.pt Traceback (most recent call last): File "/content/drive/MyDrive/yolov7/train.py", line 616, in train(hyp, opt, device, tb_writer) File "/content/drive/MyDrive/yolov7/train.py", line 98, in train train_path = data_dict['train'] KeyError: 'train' " in google colab i am trying to run this using free gpu

glenn-jocher commented 1 year ago

@Farhad2590 make sure you have a data.yaml file specified and it is correctly formatted. This error typically occurs when the train key is missing or misnamed in the data.yaml file.

For example, the data.yaml file should look similar to this:

train: path/to/train.txt
val: path/to/val.txt
test: path/to/test.txt

nc: 80
names: ['class1', 'class2', ..., 'classN']

Ensure that you have correctly defined the path to your training dataset in the train field of the data.yaml file. Double-check the spelling and make sure the path to your training set is correct.

If this issue persists, please provide the content of your data.yaml file and any other relevant details that could help us further investigate the problem.

sanjayjackson commented 1 year ago

detect: weights=['last.pt'], source=test, data=data\coco128.yaml, imgsz=[416, 416], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=True, save_csv=False, save_conf=True, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1 YOLOv5 2023-9-14 Python-3.8.0 torch-2.0.1+cpu CPU

Fusing layers... Model summary: 157 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs Traceback (most recent call last): File "detect.py", line 285, in main(opt) File "detect.py", line 280, in main run(*vars(opt)) File "C:\Users\sanjay\anaconda3\envs\pytorch-gpu\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "detect.py", line 101, in run model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half) File "H:\detection\yolov5-master\models\common.py", line 513, in init if names[0] == 'n01440764' and len(names) == 1000: # ImageNet KeyError: 0

glenn-jocher commented 1 year ago

@sanjayjackson this error typically occurs when there is an issue with your class labels in the provided data.yaml file. The error message suggests that there is a problem with the class label indexing.

To resolve this issue, please ensure the following:

  1. Verify that your data.yaml file is correctly formatted and contains the necessary information. Specifically, check that the names field is a list of class names and that it is not empty.

  2. Double-check the indices of your class labels. The error message indicates that there might be an issue with the indexing of the class labels. Ensure that the first class label in your names list has an index of 0. If you are using an external script to generate the data.yaml file, make sure it is correctly generating the class labels and their corresponding indices.

  3. If you are using a pre-trained model for detection, ensure that the data.yaml file provided matches the configuration of the pre-trained model.

By verifying the above points, you should be able to resolve the KeyError: 0 issue. If the problem persists, please provide more details about your setup and the specific steps you followed to encounter this error.

sanjayjackson commented 1 year ago

thanks i tried but didn't work this data.yml file

# Example usage: python train.py --data coco128.yaml
# parent
# β”œβ”€β”€ yolov5
# └── datasets
#     └── coco128  ← downloads here (7 MB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
#path: ../datasets/coco128  # dataset root dir
train: H:\detection\yolov5-master\data\train_data\images\train  # train images (relative to 'path') 128 images
val: H:\detection\yolov5-master\data\train_data\images\val  # val images (relative to 'path') 128 images
#test:  # test images (optional)

# Classes
nc: 1
names: ['numberplate']

# Download script/URL (optional)
#download: https://ultralytics.com/assets/coco128.zip

################################### what the error is

glenn-jocher commented 1 year ago

@sanjayjackson it seems like you are still encountering issues with your YOLOv5 training, even after modifying the data.yaml file. The error you are facing might be due to various reasons.

Here are a few things you can try to resolve the issue:

  1. Ensure that the paths specified in the train and val fields of the data.yaml file are correct. Double-check the directory structure and confirm that the train and validation image folders are in the right location.

  2. Verify that the image file extensions are correct. YOLOv5 expects image files with certain extensions (e.g., .jpg, .png). Make sure all your images have the correct file extension.

  3. Check if there are any misspellings or extra spaces in the class names defined in the names field. Ensure that the class name 'numberplate' matches the label names exactly as they are used in your datasets.

  4. Confirm that the number of classes specified in the nc field matches the total number of classes in your dataset (in this case, 1 for the 'numberplate' class).

  5. If you are using a YAML file with UTF-8 encoding, ensure there are no hidden special characters that might be causing parsing issues. Try opening the file in a text editor that can display non-visible characters and remove any unwanted characters if found.

Please check these suggestions and let me know if the issue persists or if you have any further questions.