ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.25k stars 3.45k forks source link

Cannot reproduce results in Example:-Train-Single-Class #580

Closed davidkadlecek closed 5 years ago

davidkadlecek commented 5 years ago

Describe the bug Running Example:-Train-Single-Class in google colab hangs and is not improving mAP

To Reproduce Steps to reproduce the behavior:

  1. Open Google Colab Notebook...
  2. Run the following code (no change to yolo config files, random weights initialization)... import time import glob import torch import os from IPython.display import Image, clear_output print('PyTorch %s %s' % (torch.version, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))

    3.0+cu100 _CudaDeviceProperties(name='Tesla K80', major=3, minor=7, total_memory=11441MB, multi_processor_count=13)

!git clone https://github.com/ultralytics/yolov3 # clone !bash yolov3/data/get_coco_dataset_gdrive.sh # copy COCO2014 dataset (19GB) %cd yolov3

!python3 train.py --data data/coco_1cls.data --cfg cfg/yolov3-spp-1cls.cfg --batch-size 1 --accumulate 1

Real behavior training hangs after 6 epochs. If I provide additional argument --img-size 608 then it does not hang but mAP is 0 even after 200 iterations and is not improving.

Expected behavior ... is being trained and achieves stated results

glenn-jocher commented 5 years ago

@davidkadlecek this tutorial hasnt been updated in a while, but if you look at the custom training tutorial it should have reproducible code. You can also look at this comment for the latest tutorial results with code to reproduce (recent from a few weeks ago): https://github.com/ultralytics/yolov3/issues/106#issuecomment-547521991

glenn-jocher commented 5 years ago

@davidkadlecek the single-class tutorial has been updated with the latest results and the latest code to reproduce. The biggest change was probably that backbones need to be specifically called for, before darknet53 backbone was loaded by default (now default behavior is no backbone).