ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.16k stars 3.44k forks source link

Anchor Sizes For Custom Training and Test Sets #305

Closed enchainingrealm closed 5 years ago

enchainingrealm commented 5 years ago

I'm training and testing YOLOv3 on my own dataset. Every image in my dataset is 1376 width by 800 height.

I run K-Means on my training set to cluster the ground truth bounding box dimensions into 9 clusters. Each cluster mean is a dimension pair (box width, box height). I get 9 cluster means which I use as my anchors.

In my .cfg file, I Ctrl+F "anchors" and I paste in my anchors (I do this for each of the three YOLO layers.)

YOLOv3 performs poorly on both my training and test sets. It detects small objects with high precision but fails to detect large objects. I'm assuming I forgot to scale the anchors in my .cfg file. What's the procedure to define anchors for a custom dataset?

glenn-jocher commented 5 years ago

@enchainingrealm you can specify your own anchors from k-means results in the 3 YOLO layers of the cfg file: https://github.com/ultralytics/yolov3/blob/9cf5ab0c9d41231148e8f6df23a4797ffa8e6d1a/cfg/yolov3-spp.cfg#L815

Remember though these are in units of pixels at the expected inference size, not at the native image size. If you plan on running inference at 416 pixels for example, then you should multiply your k-means results by 416/1376 = 0.30.

enchainingrealm commented 5 years ago

Where do I set the expected inference size?

In the .cfg file I've already set the "width" and "height" hyper-parameters, namely "width=1376" and "height=800". Those are the native dimensions of all my training and test images.

glenn-jocher commented 5 years ago

During training, or testing, or inference, you always set the image size using the same argparser argument --img-size. So if you plan on training at 416, then multiply all your kmeans results by 416/1376 like I said before. https://github.com/ultralytics/yolov3/blob/9cf5ab0c9d41231148e8f6df23a4797ffa8e6d1a/train.py#L310

enchainingrealm commented 5 years ago

Thank you for the quick responses. I still have a few questions about the concept of size.

  1. If the --img-size argument is king, then what is the point of the width and height hyperparameters in the .cfg files?
  2. As mentioned above, my source images are 1376-by-800. Let's suppose I set --img-size to be 1376. Does the pipeline pad the shorter side of my images to get 1376-by-1376 images?
  3. Now suppose I set --img-size to be 800. Does the pipeline scale my images down so that the longer side is 800 pixels, and then pad the shorter side to get 800-by-800 images?
glenn-jocher commented 5 years ago

@enchainingrealm the cfg files were created by darknet authors, they use the width and height parameters there in their repositories.

I suggest you google letterboxing, its very simple. You can see examples of this in the README.

enchainingrealm commented 5 years ago

Seems like I was understanding the concept of letterboxing correctly (i.e. pad a rectangular image to make it square without changing the aspect ratio of the contents.)

I re-trained on my data after setting the --img-size argument to 1376, and the results are now satisfactory.

Thank you for the clarifications. I'm closing this issue now.

Chida15 commented 5 years ago

Seems like I was understanding the concept of letterboxing correctly (i.e. pad a rectangular image to make it square without changing the aspect ratio of the contents.)

I re-trained on my data after setting the --img-size argument to 1376, and the results are now satisfactory.

Thank you for the clarifications. I'm closing this issue now.

hello, sorry to bother you. I just wanna how which codes you used to calculate your anchor? I used different kmeans codes, and just got different results. But both of them didn't work well on my data.

glenn-jocher commented 5 years ago

@Chida15 I would simply start with the default anchors in yolov3-spp.cfg.

If you want to try kmeans anchors you can use kmeans_targets(): https://github.com/ultralytics/yolov3/blob/58f868a79ad755b68fd75556fb1d946cdd3ab8e5/utils/utils.py#L571-L610

Gaondong commented 4 years ago

@Chida15 I would simply start with the default anchors in yolov3-spp.cfg.

If you want to try kmeans anchors you can use kmeans_targets(): https://github.com/ultralytics/yolov3/blob/58f868a79ad755b68fd75556fb1d946cdd3ab8e5/utils/utils.py#L571-L610

Could I use this code calculate the yolov3-tiny anchors, because the latest code I have encountered the following problem: “AttributeError: 'Tensor' object has no attribute 'T'”

image

Broad-sky commented 4 years ago

Thank you very much for your pro. here, I have a question, why do you use kmeans algorithm simply and directly , which is different from original author's? @glenn-jocher

glenn-jocher commented 10 months ago

@Broad-sky thanks for your question. The choice to use k-means for generating anchors is more of a preference rather than a rule. It's a common practice in the community, and it usually provides good starting points for anchor sizes.

If you're curious about the approach, I suggest looking into the differences in the anchor generation methods and how they might impact the training on different datasets.

Always happy to help!