Closed melih-unsal closed 4 years ago
@melih1996 I don't understand the s
parameter you refer to. The anchors can be modified in the cfg files. You might want to look at the tutorial for modifying the cfg.
https://github.com/ultralytics/yolov3/blob/0958d81580a8a0086ac3326feeba4f6db20b70a5/cfg/yolov3-spp.cfg#L805-L820
I mean in for all versions of yolo, we divide the image by SxS grid cells and to detect smaller objects better I want to change s for my custom dataset. So if the number of anchor boxes for a grid cell is b then the output tensor shape is SxSx[bx(4+1+80)] for coco dataset and it is the 'S' I refer to. For changing the number of anchor boxes, I want to use pretrained weights but when I change the anchor boxes(I mean also the number of anchor boxes, otherwise it run as I expected) it gives me an error. Then I modified the mask values but I also got errors.
Yes, if you change the model structure the pretrained weights no longer correspond anymore. If you modify the cfg you can train from scratch:
python3 train.py --weights ''
Thank you, I see your point. By the way, how do I change the variable S that I referred to ?
@melih1996 there is no S parameter. You modify the img-size directly with the argparser aguments:
python3 train.py --img-size 320
So you mean the cell scale is 32 and the changing variable is the number of cells instead of the scale of the cells.
@melih1996 oh perhaps you are talking about the stride? The strides are different for each yolo layer, but in the default model they should be 32, 16 and 8 for the large, medium and small object output layers.
https://github.com/ultralytics/yolov3/blob/0958d81580a8a0086ac3326feeba4f6db20b70a5/models.py#L160
So you mean the cell scale is 32 and the changing variable is the number of cells instead of the scale of the cells.
S is the size of feature map, S = img_size / 32, downsample x32 by darknet 53 backbone
I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.
I have the same question as @melih1996. I summarized the important parts (which regards to this question) from the YOLOv1 Paper.
In models.py (line 155-171) I found this code:
class YOLOLayer(nn.Module):
def __init__(self, anchors, nc, img_size, yolo_index, arc):
super(YOLOLayer, self).__init__()
self.anchors = torch.Tensor(anchors)
self.na = len(anchors) # number of anchors (3)
self.nc = nc # number of classes (80)
self.no = nc + 5 # number of outputs
self.nx = 0 # initialize number of x gridpoints
self.ny = 0 # initialize number of y gridpoints
self.arc = arc
if ONNX_EXPORT: # grids must be computed in __init__
stride = [32, 16, 8][yolo_index] # stride of this layer
nx = int(img_size[1] / stride) # number x grid points
ny = int(img_size[0] / stride) # number y grid points
create_grids(self, img_size, (nx, ny))
It seems like, your S is called self.nx and self.ny and if ONNX_EXPORT is True, then your answer from above is totally right. But there is no else-case. In your forward-method there is a method called create_grids
. It seems like this method does the job and S is in the default mode 13. Is it useful to change this value if we have other image sizes than e.g. COCO (since for Pascal the YOLOv1 authors used S=7). Or is changing the anchor sizes equivalent to adjust this S respective self.nx and self.ny?
@Deep-Learner anchors and gridpoints are completely separate items, they have nothing to do with each other. The yolo output layer has img-size // stride
gridpoints as you see in your pasted code. Anchors are defined in the cfg:
https://github.com/ultralytics/yolov3/blob/4a90221e79fbed6b952411b95dd8f06823c526fc/cfg/yolov3-spp.cfg#L640-L649
Thank you for your answer. Yes, I know that these are different things. However, if we increase the number of gridpoints (S^2 -> (S+k)^2; with k > 0) and taking the standard anchor sizes it may be, that this has the same effect (in sense of Precision, Recall what ever) as taking the standard gridpoint number and define our own anchor sizes. In other words: Since, you implemented the gridpoints as hardcoded values, it may be, that it is more common to change the anchor sizes instead of the number of gridpoints (if it has the same effect).
@Deep-Learner I just don't understand the question. The anchors are defined in pixel space, the inference can occur at any size of your choosing.
@glenn-jocher thanks again for your time. I don't know how I could describe it in other words. However, I'll focus on finding the best anchor sizes for our data set. That should also improve our results. Thanks again for your great work.
@Deep-Learner ah, if you want to adjust anchor size, you should run a kmeans analysis, though I only recommend this if your labels are oddly shaped, or outside of the typical COOC shapes/sizes: https://github.com/ultralytics/yolov3/blob/585064f300fa2d4bb80667927adc12caf1dbcc30/utils/utils.py#L765-L770
@glenn-jocher Thanks for your hint. I was already thinking about k-means for determining the best anchor sizes, but I thought I have to implement this by myself. Nice that you have already done this. Thanks a lot.
Back to the question before: I think I found a counterexample to my idea before (which makes the second part of original question important again). Think of the following situation: We only want to detect persons in an image (just for simplifying my idea). At least for YOLOv1, each grid cell can only predict one instance of an object, right? This means for our case, each cell can at most predict one person. But what happens if one cell is so large (because our images are Full-HD and we divide the image into a 7x7 grid) that one cell should predict many persons, that wouldn't work, right? So, for this special case, we should increase the number of grid cells, or can YOLOv3 handle this case? Sry I never read the Paper for YOLOv2 and YOLOv3, I'll do this in the next few weeks.
@Deep-Learner yes you better read the v2 and v3 papers.
v3 has multiple anchors at each output layer, and multiple output layers, with 10,000 anchors per image total a typical number. If your objects are smaller or larger you can adjust your anchor shapes and anchor count to compensate.
@glenn-jocher thanks for clarifying this. That helped a lot.
EDIT: Now that I have read the YOLOv2 and YOLOv3 paper, it makes perfect sense what you wrote. In YOLOv1 there was this fully connected layer, and my idea was to change the transformation/reshape from this fully connected layer to the output tensor shape to get a larger grid size. However, this fully connected layer was removed in YOLOv2. Therefore YOLOv2 and YOLOv3 are FCN (Full Convolutional Networks), and therefore the grid size of the output tensor shape depends on the input size. Which is exactly the factor 32 that you mentioned earlier. If we wanted to change this factor, we would have to change the architecture of YOLO (I think you also mentioned that already). Sorry for the whole confusion.
Thanks a lot for your great work and nice support. :)
@Deep-Learner You're welcome! Don't hesitate to reach out if you have more questions in the future. Good luck with your project and happy coding!
🚀 Feature
Hi, how to change the number of anchor boxes during training? I also wonder where is the parameter S set in the code which shows the square root of the the number of grid cells in the image.(yolo has SxS grid cells)