Details about yolov10 raw output without post processing

rsemihkoca commented 1 month ago

Search before asking

[X] I have searched the Ultralytics YOLO issues and discussions and found no similar questions.

Question

When I look at the raw results from yolov10, I see that a value called one2one is calculated. What is this? The shape of the raw outputs is 300 to 6. There are 300 predictions, but most of them are in the same place. Does one2one reduce these? Why do we calculate cv3 ? When I examine the code, a convulsion neural network is used. I don't understand why this is done? I am generally curious about the one2one part. If the model is nms-free, what does this field do? I would expect the model to come as if nms was already applied in the raw outputs.

Additional

class v10Detect(Detect):

    max_det = 300

    def __init__(self, nc=80, ch=()):
        super().__init__(nc, ch)
        c3 = max(ch[0], min(self.nc, 100))  # channels
        self.cv3 = nn.ModuleList(nn.Sequential(nn.Sequential(Conv(x, x, 3, g=x), Conv(x, c3, 1)), \
                                               nn.Sequential(Conv(c3, c3, 3, g=c3), Conv(c3, c3, 1)), \
                                                nn.Conv2d(c3, self.nc, 1)) for i, x in enumerate(ch))

        self.one2one_cv2 = copy.deepcopy(self.cv2)
        self.one2one_cv3 = copy.deepcopy(self.cv3)

    def forward(self, x):
        one2one = self.forward_feat([xi.detach() for xi in x], self.one2one_cv2, self.one2one_cv3)
        if not self.export:
            one2many = super().forward(x)

        if not self.training:
            one2one = self.inference(one2one)
            if not self.export:
                return {"one2many": one2many, "one2one": one2one}
            else:
                assert(self.max_det != -1)
                boxes, scores, labels = ops.v10postprocess(one2one.permute(0, 2, 1), self.max_det, self.nc)
                return torch.cat([boxes, scores.unsqueeze(-1), labels.unsqueeze(-1).to(boxes.dtype)], dim=-1)
        else:
            return {"one2many": one2many, "one2one": one2one}

UltralyticsAssistant commented 1 month ago

👋 Hello @rsemihkoca, thank you for your detailed question about the YOLOv10 raw outputs 🚀!

This is an automated response to let you know that your query is being processed. An Ultralytics engineer will assist you soon to address your specific question about the one2one calculations and the use of convolutional networks in the model.

In the meantime, I recommend checking out our Documentation for more insights into model architectures and outputs.

If this is a 🐛 Bug Report, make sure to provide a minimum reproducible example that can help us investigate further.

For real-time interactions and questions, join us on Discord 🎧. You might also find our Discourse and Subreddit helpful for community support.