Closed bigrobinson closed 5 years ago
Hello, thank you for your interest in our work! Please note that most technical problems are due to:
git clone
version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
sudo rm -rf yolov3 # remove exising repo
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # git clone latest
python3 detect.py # verify detection
python3 train.py # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE
train_batch0.jpg
and test_batch0.jpg
for a sanity check of training and testing data.In your case it looks like the OpenCV error occurs in test.py when attempting to plot test_batch0.jpg, which shows you the training data with the labels. It's likely there is an error with your training data somewhere. You can run a validated working example like this and try to debug from there:
python3 train.py --data data/coco_16img.data --epochs 1
Namespace(accumulate=8, backend='nccl', batch_size=8, cfg='cfg/yolov3-spp.cfg', data_cfg='data/coco_16img.data', dist_url='tcp://127.0.0.1:9999', epochs=1, evolve=False, giou=False, img_size=320, nosave=False, notest=False, num_workers=4, rank=0, resume=False, single_scale=False, transfer=False, var=0, world_size=1)
Using CPU
Reading labels: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 10275.43it/s]
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Epoch Batch xy wh conf cls total targets time
0/0 0/1 0.418 0.67 26.3 3.73 31.1 45 13.8
0/0 1/1 0.383 0.552 26.2 3.72 30.9 27 13.6
1 epochs completed in 0.008 hours.
Reading labels: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 12363.46it/s]
Class Images Targets P R mAP F1
Computing mAP: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.57s/it]
all 16 76 0 0 0 0
person 16 16 0 0 0 0
car 16 14 0 0 0 0
motorcycle 16 1 0 0 0 0
airplane 16 1 0 0 0 0
train 16 1 0 0 0 0
truck 16 3 0 0 0 0
stop sign 16 1 0 0 0 0
horse 16 2 0 0 0 0
elephant 16 2 0 0 0 0
zebra 16 1 0 0 0 0
giraffe 16 4 0 0 0 0
umbrella 16 1 0 0 0 0
handbag 16 1 0 0 0 0
skateboard 16 3 0 0 0 0
fork 16 1 0 0 0 0
knife 16 5 0 0 0 0
bowl 16 3 0 0 0 0
orange 16 4 0 0 0 0
broccoli 16 1 0 0 0 0
cake 16 1 0 0 0 0
potted plant 16 2 0 0 0 0
microwave 16 1 0 0 0 0
oven 16 1 0 0 0 0
book 16 3 0 0 0 0
clock 16 2 0 0 0 0
vase 16 1 0 0 0 0
train_batch0.jpg
test_batch0.jpg
Thanks for the prompt response. I ran the working example and it went off flawlessly. Let me ask you this: I am using multiple sensors to record my training data. What effect will it have if my training images have different aspect ratios and resolutions?
@bigrobinson it shouldn't have any effect as for example the sources in coco come from various devices and resolutions etc. It actually improves the generalization ability of the network in real world scenarios if the data is sourced from a variety of places.
Typically custom training problems are due to formatting issues. Make sure that your training data is labelled and structured the exact same way as the coco dataset.
The error is due to calculation of negative border dimensions in the letterbox method of datasets.py causing an exception to be thrown by the call to cv2.copyMakeBorder, when rectangular training is set to TRUE. Note that rectangular training is set to FALSE by default in train.py:
# Dataset
rectangular_training = False
dataset = LoadImagesAndLabels(train_path,
img_size,
batch_size,
augment=True,
rect=rectangular_training)
Whereas it is set to TRUE by default in test.py (which is called by train.py when called as main):
dataset = LoadImagesAndLabels(test_path, img_size, batch_size) # Note rect=True by default
dataloader = DataLoader(dataset,
batch_size=batch_size,
num_workers=4,
pin_memory=True,
collate_fn=dataset.collate_fn)
When I set rect=False in the call to LoadImagesAndLabels in test.py, the problem is resolved. On the other hand, when I set rectangular training to TRUE in train.py, the problem appears again when cv2.copyMakeBorder is called. It appears to be a bug in the letterbox padding calculation.
@bigrobinson yes, rectangular training is still very much in development, and is not currently compatible with multi-scale training for example. If you set rect=False in both data loaders this should resolve your issue.
If you can more clearly determine the exact size of the images and the padding that is causing the issue this might help also.
My test data are only 4 images and i get 0 mAP for a single class problem. my test batch matches the validation set but getting mAP of 0 and all other parameters like P and R and F1 doesnt make sense.
@sanazss this indicates that no objects were detected above threshold in your test set.
Hey @glenn-jocher I was in Germany a couple of weeks and was going to take a look at the letterbox routine now that I'm back. I see you have made a lot of changes to the code base since then. Looking good. I have re-synced to the master and re-run training with my data and with rect=True. I am no longer getting the negative boundary error I was getting before. Cheers!
@bigrobinson yes we've been busy with updates. In general you should see better results now, and rectangular training is now compatible with multiscale.
@bigrobinson I'll go ahead and close this now since the issue seems resolved.
Hi @glenn-jocher I am facing this issue with the updated repository. When I run the 'Reproduce our results' from https://docs.ultralytics.com/yolov5/tutorials/train_custom_data. It seems to run perfectly. However, for only 3 class, when I create new dataset from coco datatsets with labels only for those 3 and change the config file as well accordingly, I get the error after 10th epoch:
10/329 1.57G 5.68 27.5 33 66.2 5 416: 88%|▉| 7/8 [00:01<00:00, 3 10/329 1.57G 5.22 27.3 30.7 63.2 18 416: 88%|▉| 7/8 [00:02<00:00, 3 10/329 1.57G 5.22 27.3 30.7 63.2 18 416: 100%|█| 8/8 [00:02<00:00, 4 10/329 1.57G 5.22 27.3 30.7 63.2 18 416: 100%|█| 8/8 [00:02<00:00, 3.74it/s] Class Images Targets P R mAP@0.5 F1: 0%| | 0/8 [00:00<?, ?it/s]Traceback (most recent call last): File "train.py", line 429, in
@Pari-singh there's probably something wrong with one of your images then. You should run in debug mode so you can capture the values being passes to the cv2 function and try to find the image responsible.
We routinely train on custom datasets for our clients without issue.
Hello all, Anyone found a solution for this problem or any hit that what would be the problem. I have the same problem for training on a custom dataset.
@akbari59 as mentioned earlier, you should run the training in debug mode to capture the values being passed to the cv2 function and try to identify the problematic image. This will likely help pinpoint the issue.
First of all, thank you for the hard work and the great documentation. Can you help with this error? It occurs upon calculation of the mAP and appears related to an opencv rendering. I have 4 classes I am training with 1720 labeled training images. Any help is much appreciated.
*** cfg file hyperparameters [net] batch=64 subdivisions=8 width=608 height=608 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1 learning_rate=0.001 burn_in=2000 max_batches=500200 policy=steps steps=400000,450000 scales=.1,.1
** cfg file pre-yolo layer parameters [convolutional] size=1 stride=1 pad=1 filters=27 activation=linear
****Error out
OpenCV(3.4.1) Error: Assertion failed (top >= 0 && bottom >= 0 && left >= 0 && right >= 0) in copyMakeBorder, file /opt/conda/conda-bld/opencv-suite_1527005194613/work/modules/core/src/copy.cpp, line 1182 Traceback (most recent call last): File "train.py", line 347, in
accumulate=opt.accumulate,
File "train.py", line 260, in train
conf_thres=0.1)
File "/home/brian/yolov3/test.py", line 60, in test
for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc='Computing mAP')):
File "/home/brian/anaconda3/lib/python3.6/site-packages/tqdm/_tqdm.py", line 1005, in iter
for obj in iterable:
File "/home/brian/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/home/brian/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
cv2.error: Traceback (most recent call last):
File "/home/brian/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/brian/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/brian/yolov3/utils/datasets.py", line 269, in getitem
img, ratio, padw, padh = letterbox(img, new_shape=shape, mode='rect')
File "/home/brian/yolov3/utils/datasets.py", line 359, in letterbox
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # padded square
cv2.error: OpenCV(3.4.1) /opt/conda/conda-bld/opencv-suite_1527005194613/work/modules/core/src/copy.cpp:1182: error: (-215) top >= 0 && bottom >= 0 && left >= 0 && right >= 0 in function copyMakeBorder