Closed jerry73204 closed 4 years ago
Hello @jerry73204, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@jerry73204 these regression ranges span a grid (0.0 - 1.0) symmetrically, they are mean 0.5, not mean 0.0.
Got it. Thanks.
The way to think about it is that while a position inside the grid is defined with respect to the 0,0 origin, the actual receptive field of the grid cell is centered at 0.5, 0.5. A neuron outputting zero will be producing a regression output at the exact center of the grid cell since sigmoid(0) = 0.5
Yes. As you pointed out, I realized * 2. - 0.5
expands the range but but is still centered at 0.5.
I'd like to ask another question. I see the build_target
in loss function (code) scales target
by gain
. Look like it scales the sizes and positions from ratios to units of grids. Does it mean the GIoU is computed in grid units?
If yes, suppose the line calculates the width/height ratio by dividing target sizes by anchor sizes both in units of grids? I think it's relevant to the place dividing the anchor sizes in anticipation. If I think it correctly, we could distinguish var names for the anchors in pixels and anchors in grids to avoid confusion.
@jerry73204 yes I think this all make sense. The anchors at the moment are named for their use, so anchor_grid are applied to the grid during inference etc.
The units of the operations are not mathematically important, the wh ratio and the GIoU can be calculated in any non-normalized units, they are implemented as is for computational efficiency.
BTW, the anchor values themselves are mostly the same output normalization (i.e. to unity variance and zero mean) that ML models have been using for decades, the main innovation here is the use of multiple anchors per grid cell. AutoAnchor will analyze any supplied anchors for suitability in combination with your supplied dataset, and recompute and integrate new anchors automatically before training starts.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
seem like the regression target of center is normalized by grid units (double of stride of the current feature map),and the regression target of size is normalized by pre-defined anchors size, right? @jerry73204 @glenn-jocher
Yes.
🐛 Bug
I noticed the line has a potential error shown as the following.
If I understand it correctly,
self.stride[i]
can be regarded as grid width in pixels, andself.grid[i]
can be regarded as enumerated (x, y) in units of grids.The term
(y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device))
can be seen as a compensation on positions in grid units. That is, it computes (x + Δx, y + Δy) in grid units. Then, by multiplyingself.stride[i]
, it turns to pixel units and is saved toy[..., 0:2]
.It made me wonder why the offset term
y[..., 0:2] * 2. - 0.5
is chosen to be asymmetric. The termy[..., 0:2]
came from sigmoid, thusy[..., 0:2] * 2. - 0.5
has range [-0.5, 1.5]. It means the offset is not centered at zero.Expected behavior
I expect the formula to be
Environment