ultralytics / ultralytics

Ultralytics YOLO11 🚀
https://docs.ultralytics.com
GNU Affero General Public License v3.0
32.6k stars 6.27k forks source link

Details about yolov10 raw output without post processing #16502

Open rsemihkoca opened 1 month ago

rsemihkoca commented 1 month ago

Search before asking

Question

When I look at the raw results from yolov10, I see that a value called one2one is calculated. What is this? The shape of the raw outputs is 300 to 6. There are 300 predictions, but most of them are in the same place. Does one2one reduce these? Why do we calculate cv3 ? When I examine the code, a convulsion neural network is used. I don't understand why this is done? I am generally curious about the one2one part. If the model is nms-free, what does this field do? I would expect the model to come as if nms was already applied in the raw outputs.

Additional

class v10Detect(Detect):

    max_det = 300

    def __init__(self, nc=80, ch=()):
        super().__init__(nc, ch)
        c3 = max(ch[0], min(self.nc, 100))  # channels
        self.cv3 = nn.ModuleList(nn.Sequential(nn.Sequential(Conv(x, x, 3, g=x), Conv(x, c3, 1)), \
                                               nn.Sequential(Conv(c3, c3, 3, g=c3), Conv(c3, c3, 1)), \
                                                nn.Conv2d(c3, self.nc, 1)) for i, x in enumerate(ch))

        self.one2one_cv2 = copy.deepcopy(self.cv2)
        self.one2one_cv3 = copy.deepcopy(self.cv3)

    def forward(self, x):
        one2one = self.forward_feat([xi.detach() for xi in x], self.one2one_cv2, self.one2one_cv3)
        if not self.export:
            one2many = super().forward(x)

        if not self.training:
            one2one = self.inference(one2one)
            if not self.export:
                return {"one2many": one2many, "one2one": one2one}
            else:
                assert(self.max_det != -1)
                boxes, scores, labels = ops.v10postprocess(one2one.permute(0, 2, 1), self.max_det, self.nc)
                return torch.cat([boxes, scores.unsqueeze(-1), labels.unsqueeze(-1).to(boxes.dtype)], dim=-1)
        else:
            return {"one2many": one2many, "one2one": one2one}
UltralyticsAssistant commented 1 month ago

👋 Hello @rsemihkoca, thank you for your detailed question about the YOLOv10 raw outputs 🚀!

This is an automated response to let you know that your query is being processed. An Ultralytics engineer will assist you soon to address your specific question about the one2one calculations and the use of convolutional networks in the model.

In the meantime, I recommend checking out our Documentation for more insights into model architectures and outputs.

If this is a 🐛 Bug Report, make sure to provide a minimum reproducible example that can help us investigate further.

For real-time interactions and questions, join us on Discord 🎧. You might also find our Discourse and Subreddit helpful for community support.

Upgrade

Please ensure you're using the most recent version of the ultralytics package with all dependencies, by running:

pip install -U ultralytics

Environments

You can also run YOLOv8 in a variety of environments:

Status

Ultralytics CI

Stay tuned for assistance from our team! 😊

Y-T-G commented 1 month ago

You can read the YOLOv10 paper for the details on the architecture.

https://arxiv.org/abs/2405.14458