ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.16k stars 3.44k forks source link

change the number of anchor boxes #841

Closed melih-unsal closed 4 years ago

melih-unsal commented 4 years ago

🚀 Feature

Hi, how to change the number of anchor boxes during training? I also wonder where is the parameter S set in the code which shows the square root of the the number of grid cells in the image.(yolo has SxS grid cells)

glenn-jocher commented 4 years ago

@melih1996 I don't understand the s parameter you refer to. The anchors can be modified in the cfg files. You might want to look at the tutorial for modifying the cfg. https://github.com/ultralytics/yolov3/blob/0958d81580a8a0086ac3326feeba4f6db20b70a5/cfg/yolov3-spp.cfg#L805-L820

melih-unsal commented 4 years ago

I mean in for all versions of yolo, we divide the image by SxS grid cells and to detect smaller objects better I want to change s for my custom dataset. So if the number of anchor boxes for a grid cell is b then the output tensor shape is SxSx[bx(4+1+80)] for coco dataset and it is the 'S' I refer to. For changing the number of anchor boxes, I want to use pretrained weights but when I change the anchor boxes(I mean also the number of anchor boxes, otherwise it run as I expected) it gives me an error. Then I modified the mask values but I also got errors.

glenn-jocher commented 4 years ago

Yes, if you change the model structure the pretrained weights no longer correspond anymore. If you modify the cfg you can train from scratch:

python3 train.py --weights ''
melih-unsal commented 4 years ago

Thank you, I see your point. By the way, how do I change the variable S that I referred to ?

glenn-jocher commented 4 years ago

@melih1996 there is no S parameter. You modify the img-size directly with the argparser aguments:

python3 train.py --img-size 320
melih-unsal commented 4 years ago

So you mean the cell scale is 32 and the changing variable is the number of cells instead of the scale of the cells.

glenn-jocher commented 4 years ago

@melih1996 oh perhaps you are talking about the stride? The strides are different for each yolo layer, but in the default model they should be 32, 16 and 8 for the large, medium and small object output layers.

https://github.com/ultralytics/yolov3/blob/0958d81580a8a0086ac3326feeba4f6db20b70a5/models.py#L160

clw5180 commented 4 years ago

So you mean the cell scale is 32 and the changing variable is the number of cells instead of the scale of the cells.

S is the size of feature map, S = img_size / 32, downsample x32 by darknet 53 backbone

glenn-jocher commented 4 years ago

I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.

matthias-tschoepe commented 4 years ago

I have the same question as @melih1996. I summarized the important parts (which regards to this question) from the YOLOv1 Paper. Unbenannt_2

In models.py (line 155-171) I found this code:

class YOLOLayer(nn.Module):
    def __init__(self, anchors, nc, img_size, yolo_index, arc):
        super(YOLOLayer, self).__init__()

        self.anchors = torch.Tensor(anchors)
        self.na = len(anchors)  # number of anchors (3)
        self.nc = nc  # number of classes (80)
        self.no = nc + 5  # number of outputs
        self.nx = 0  # initialize number of x gridpoints
        self.ny = 0  # initialize number of y gridpoints
        self.arc = arc

        if ONNX_EXPORT:  # grids must be computed in __init__
            stride = [32, 16, 8][yolo_index]  # stride of this layer
            nx = int(img_size[1] / stride)  # number x grid points
            ny = int(img_size[0] / stride)  # number y grid points
            create_grids(self, img_size, (nx, ny))

It seems like, your S is called self.nx and self.ny and if ONNX_EXPORT is True, then your answer from above is totally right. But there is no else-case. In your forward-method there is a method called create_grids. It seems like this method does the job and S is in the default mode 13. Is it useful to change this value if we have other image sizes than e.g. COCO (since for Pascal the YOLOv1 authors used S=7). Or is changing the anchor sizes equivalent to adjust this S respective self.nx and self.ny?

glenn-jocher commented 4 years ago

@Deep-Learner anchors and gridpoints are completely separate items, they have nothing to do with each other. The yolo output layer has img-size // stride gridpoints as you see in your pasted code. Anchors are defined in the cfg: https://github.com/ultralytics/yolov3/blob/4a90221e79fbed6b952411b95dd8f06823c526fc/cfg/yolov3-spp.cfg#L640-L649

matthias-tschoepe commented 4 years ago

Thank you for your answer. Yes, I know that these are different things. However, if we increase the number of gridpoints (S^2 -> (S+k)^2; with k > 0) and taking the standard anchor sizes it may be, that this has the same effect (in sense of Precision, Recall what ever) as taking the standard gridpoint number and define our own anchor sizes. In other words: Since, you implemented the gridpoints as hardcoded values, it may be, that it is more common to change the anchor sizes instead of the number of gridpoints (if it has the same effect).

glenn-jocher commented 4 years ago

@Deep-Learner I just don't understand the question. The anchors are defined in pixel space, the inference can occur at any size of your choosing.

matthias-tschoepe commented 4 years ago

@glenn-jocher thanks again for your time. I don't know how I could describe it in other words. However, I'll focus on finding the best anchor sizes for our data set. That should also improve our results. Thanks again for your great work.

glenn-jocher commented 4 years ago

@Deep-Learner ah, if you want to adjust anchor size, you should run a kmeans analysis, though I only recommend this if your labels are oddly shaped, or outside of the typical COOC shapes/sizes: https://github.com/ultralytics/yolov3/blob/585064f300fa2d4bb80667927adc12caf1dbcc30/utils/utils.py#L765-L770

matthias-tschoepe commented 4 years ago

@glenn-jocher Thanks for your hint. I was already thinking about k-means for determining the best anchor sizes, but I thought I have to implement this by myself. Nice that you have already done this. Thanks a lot.

Back to the question before: I think I found a counterexample to my idea before (which makes the second part of original question important again). Think of the following situation: We only want to detect persons in an image (just for simplifying my idea). At least for YOLOv1, each grid cell can only predict one instance of an object, right? This means for our case, each cell can at most predict one person. But what happens if one cell is so large (because our images are Full-HD and we divide the image into a 7x7 grid) that one cell should predict many persons, that wouldn't work, right? So, for this special case, we should increase the number of grid cells, or can YOLOv3 handle this case? Sry I never read the Paper for YOLOv2 and YOLOv3, I'll do this in the next few weeks.

glenn-jocher commented 4 years ago

@Deep-Learner yes you better read the v2 and v3 papers.

v3 has multiple anchors at each output layer, and multiple output layers, with 10,000 anchors per image total a typical number. If your objects are smaller or larger you can adjust your anchor shapes and anchor count to compensate.

matthias-tschoepe commented 4 years ago

@glenn-jocher thanks for clarifying this. That helped a lot.

EDIT: Now that I have read the YOLOv2 and YOLOv3 paper, it makes perfect sense what you wrote. In YOLOv1 there was this fully connected layer, and my idea was to change the transformation/reshape from this fully connected layer to the output tensor shape to get a larger grid size. However, this fully connected layer was removed in YOLOv2. Therefore YOLOv2 and YOLOv3 are FCN (Full Convolutional Networks), and therefore the grid size of the output tensor shape depends on the input size. Which is exactly the factor 32 that you mentioned earlier. If we wanted to change this factor, we would have to change the architecture of YOLO (I think you also mentioned that already). Sorry for the whole confusion.

Thanks a lot for your great work and nice support. :)

glenn-jocher commented 10 months ago

@Deep-Learner You're welcome! Don't hesitate to reach out if you have more questions in the future. Good luck with your project and happy coding!