ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
28.08k stars 5.58k forks source link

Help on implementing custom regression head loss function #5313

Closed ThomasRochefortB closed 6 months ago

ThomasRochefortB commented 11 months ago

Search before asking

Question

Hello! I am working on implementing an additional "regression head" to the YOLOv8 segmentation model because I am dealing with a custom dataset which has 6 variables I want to predict in addition to the bbox, class and masks.

I have managed to modify the dataloader and the model head successfully. Here is my ExtendedSegment head:

class ExtendedSegment(Segment):
    """Extends the Segment class to add a regression head predicting a 6D vector."""

    def __init__(self, nc=80, nm=32, npr=256, ch=()):
        super().__init__(nc, nm, npr, ch)
        self.regression_head = nn.ModuleList(nn.Sequential(
            Conv(x, max(x // 4, 128), 3),
            Conv(max(x // 4, 128), max(x // 4, 128), 3),
            nn.Conv2d(max(x // 4, 128), 6, 1),
            nn.Sigmoid()) for x in ch)  # Produces a 6D vector for each anchor and applies sigmoid activation

    def forward(self, x):
        regression_outputs = [self.regression_head[i](x[i]).view(x[i].shape[0], 6, -1) for i in range(self.nl)]
        regression_tensor = torch.cat(regression_outputs, 2)

        # Call the parent's forward method to get original outputs and masks
        outputs = super().forward(x)

        if self.training:
            x, mc, p = outputs

            return x, mc, p, regression_tensor

        else:

            if self.export:
                out_1, out_2 = outputs
                return (out_1, out_2, regression_tensor)  
            else:
                out_1, out_2 = outputs
                return ((out_1,regression_tensor), (out_2[0],out_2[1],out_2[2], regression_tensor))

I now have an extra prediction tensors which is of shape (BS, 6, 8400). My question is about the implementation of the loss function for this extra prediction output. I want to use MSE but I find it tricky to know if my implementation is correct. So here is what I did:

  1. I first modified the TaskAlignedAssigner in tal.py to assign ground-truth regression variables to anchors by modifying the get_targets() function. I masked the target_regression variable using the fg_mask tensor:
def get_targets(self, gt_labels, gt_bboxes, target_gt_idx, fg_mask, gt_regression):

        # Assigned target labels, (b, 1)
        batch_ind = torch.arange(end=self.bs, dtype=torch.int64, device=gt_labels.device)[..., None]
        target_gt_idx = target_gt_idx + batch_ind * self.n_max_boxes  # (b, h*w)
        target_labels = gt_labels.long().flatten()[target_gt_idx]  # (b, h*w)

        # Assigned target boxes, (b, max_num_obj, 4) -> (b, h*w)
        target_bboxes = gt_bboxes.view(-1, 4)[target_gt_idx]

        # Assigned target scores
        target_labels.clamp_(0)

        # 10x faster than F.one_hot()
        target_scores = torch.zeros((target_labels.shape[0], target_labels.shape[1], self.num_classes),
                                    dtype=torch.int64,
                                    device=target_labels.device)  # (b, h*w, 80)
        target_scores.scatter_(2, target_labels.unsqueeze(-1), 1)

        fg_scores_mask = fg_mask[:, :, None].repeat(1, 1, self.num_classes)  # (b, h*w, 80)
        target_scores = torch.where(fg_scores_mask > 0, target_scores, 0)

        target_regression = gt_regression.view(-1, 6)[target_gt_idx]

        # Convert fg_mask to boolean type
        fg_mask_bool = fg_mask.bool()

        # Now create fg_regression_mask
        fg_regression_mask = fg_mask_bool.unsqueeze(-1).repeat(1, 1, 6)  # Expanding to shape (b, h*w, 6)

        # Now apply masking to target_regression
        target_regression = torch.where(fg_regression_mask, target_regression, torch.zeros_like(target_regression))

        return target_labels, target_bboxes, target_scores, target_regression

Then, in loss.py, I can also use the fg_mask to mask out the predicted regression variables:

class v8SegmentationLoss(v8DetectionLoss):
    """Criterion class for computing training losses."""

    def __init__(self, model):  # model must be de-paralleled
        super().__init__(model)
        self.nm = model.model[-1].nm  # number of masks
        try:
            self.overlap = model.args.overlap_mask
        except:
            self.overlap =False

    def __call__(self, preds, batch):
        """Calculate and return the loss for the YOLO model."""
        loss = torch.zeros(5, device=self.device)  # box, cls, dfl
        if len(preds) ==3:
            feats, pred_masks, proto = preds 
        elif len(preds) ==4:
            feats, pred_masks, proto, regression_tensor = preds
            #Let's describe each variables:
            #display_shape(preds)
        else:
            feats, pred_masks, proto, regression_tensor = preds[1]
        batch_size, _, mask_h, mask_w = proto.shape  # batch size, number of masks, mask height, mask width
        pred_distri, pred_scores = torch.cat([xi.view(feats[0].shape[0], self.no, -1) for xi in feats], 2).split(
            (self.reg_max * 4, self.nc), 1)

        # b, grids, ..
        pred_scores = pred_scores.permute(0, 2, 1).contiguous()
        pred_distri = pred_distri.permute(0, 2, 1).contiguous()
        pred_masks = pred_masks.permute(0, 2, 1).contiguous()

        dtype = pred_scores.dtype
        imgsz = torch.tensor(feats[0].shape[2:], device=self.device, dtype=dtype) * self.stride[0]  # image size (h,w)
        anchor_points, stride_tensor = make_anchors(feats, self.stride, 0.5)

        # targets
        try:
            batch_idx = batch['batch_idx'].view(-1, 1)
            targets = torch.cat((batch_idx, batch['cls'].view(-1, 1), batch['bboxes']), 1)
            targets = self.preprocess(targets.to(self.device), batch_size, scale_tensor=imgsz[[1, 0, 1, 0]])
            gt_labels, gt_bboxes = targets.split((1, 4), 2)  # cls, xyxy
            mask_gt = gt_bboxes.sum(2, keepdim=True).gt_(0)
        except RuntimeError as e:
            raise TypeError('ERROR ❌ segment dataset incorrectly formatted or not a segment dataset.\n'
                            "This error can occur when incorrectly training a 'segment' model on a 'detect' dataset, "
                            "i.e. 'yolo train model=yolov8n-seg.pt data=coco128.yaml'.\nVerify your dataset is a "
                            "correctly formatted 'segment' dataset using 'data=coco128-seg.yaml' "
                            'as an example.\nSee https://docs.ultralytics.com/tasks/segment/ for help.') from e

        if 'regression_vars' in batch:
            regression_targets = torch.tensor(np.array(batch['regression_vars'])).to(self.device).float()
        # pboxes
        pred_bboxes = self.bbox_decode(anchor_points, pred_distri)  # xyxy, (b, h*w, 4)
        test_labels, target_bboxes, target_scores, fg_mask, target_gt_idx, regression_scores = self.assigner(
            pred_scores.detach().sigmoid(), (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype),
            anchor_points * stride_tensor, gt_labels, gt_bboxes, mask_gt, regression_targets)

        target_scores_sum = max(target_scores.sum(), 1)

        # cls loss
        # loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum  # VFL way

        if 'regression_vars' in batch:
            # Assuming fg_mask has shape (b, h*w)
            # Expand the dimensions of fg_mask to match regression_tensor
            fg_regression_mask = fg_mask.unsqueeze(1).expand(-1, 6, -1)  # fg_regression_mask now has shape (BS, 6, 8400)

            # Now create masked versions of your regression tensor and regression scores
            masked_regression_tensor = regression_tensor * fg_regression_mask

            # Compute MSE loss on masked tensors
            regression_loss = F.mse_loss(masked_regression_tensor.permute(0, 2, 1).contiguous(), regression_scores,reduction="sum")

            # Optionally, normalize the loss by the number of positive samples
            num_positive_samples = fg_mask.sum()
            if num_positive_samples > 0:
                regression_loss /= num_positive_samples
            else:
                # If there are no positive samples, set regression loss to zero
                regression_loss = torch.tensor(0.0).to(fg_mask.device)

        loss[4] = regression_loss

The code runs allright and the loss decreases but I get horrible regression performance and was wondering if my loss calculation process might be the issue. Am I doing this correctly with dealing with all the 8400 predicted object instances?

Thanks for your time!

Additional

No response

github-actions[bot] commented 10 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

glenn-jocher commented 9 months ago

@ThomasRochefortB hello! It's great to see you're diving deep into modifying the YOLOv8 architecture for your custom dataset that includes regression targets along with the traditional detection and segmentation tasks.

From what you've described, it seems like you have appropriately modified the dataloader and the ExtendedSegment head to accommodate your additional regression outputs. As for implementing the custom regression loss function alongside the existing loss components, this process can indeed be nuanced.

The approach you've chosen for computing the mean squared error (MSE) loss on the regression predictions is conceptually in line with the typical process. Masking with fg_mask to account for only the foreground objects is a valid way to restrict the regression loss to the relevant predictions.

However, there are a few things to consider that might affect your performance:

  1. Masking Strategy: Your masking and computation of loss seem correct, but it's essential to cross-verify the outputs after masking and before loss computation to ensure only foreground bounding boxes contribute to the loss.

  2. Balancing Losses: When introducing a new loss component to a multi-task network like YOLOv8, it might require tuning the relative weighting of this new regression loss compared to the existing loss components. Too much weight on the regression loss could distort the learning of other tasks.

  3. Normalization: Your normalization choice of dividing by the number of positive samples is reasonable, but again this may need tuning. You could consider other normalization strategies based on batch size or the sum of weights for the loss.

  4. Hyperparameters: Since you're adding a new head, you may need to adjust learning rates or other hyperparameters, as the new task could affect the convergence balance with other tasks.

  5. Post-processing: Ensure that the regression targets are correctly post-processed after prediction to match the ground truth format. Sometimes a mismatch here can lead to poor performance even if the network is predicting accurately.

  6. Quality of Regression Targets: Check the quality and distribution of your regression targets. Make sure they are being scaled or normalized in a way that is conducive to learning and that they indeed correlate well with the object instances.

  7. Debugging Strategy: During debugging, consider simplifying the problem. For example, train only the regression head with a fixed detection model to make sure the head learns correctly, then integrate back into the full model.

As always, it's also recommended to carefully inspect the shapes and values of the tensors at different points in your pipeline to confirm that your additions work as intended during both the forward and backward passes.

Since ultimately each task's complexity can affect learning, a blend of model inspection, empirical testing, and, if needed, loss adjustment will be crucial to attaining good performance. Patience and iteration over these steps will be your ally here.

Please continue exploring and refining your approach, and if you have further questions or updates, we're here to assist. Keep up the excellent work, and we're looking forward to seeing your custom regression head fully integrated and performing well!

ThomasRochefortB commented 9 months ago

@glenn-jocher , thank you for the comprehensive answer, would the project be interested in a regression head PR ? I got everything working for my use case, was wondering if it could be useful for the overall project. Not sure if there is a general benchmark in computer vision that would require an additional regression head.

glenn-jocher commented 9 months ago

@ThomasRochefortB, I'm delighted to hear that you've got everything to work for your use case! 👏

Regarding your question about contributing a regression head PR to the project, it's always a pleasure to see contributions that extend the functionality of YOLOv8. Such features could indeed be beneficial for a range of applications where predictions beyond bounding boxes and classes are required, like predicting additional attributes or continuous variables associated with detected objects.

Regarding benchmarks, while there isn't a universal computer vision benchmark requiring a regression head, there are various domain-specific tasks where this would be valuable. For example, in autonomous vehicle perception, predicting the distance or speed of detected vehicles could be useful, or in agricultural applications, predicting the ripeness or size of detected fruits.

Before preparing a PR, there are a few considerations to keep in mind:

  1. Generic Implementation: Ensure that the regression head is implemented in a generic way that can be easily adopted or modified for various use cases.

  2. Documentation: Providing thorough documentation and usage examples would be essential for ensuring clarity on how to apply the regression head for various tasks.

  3. Modifications: It's crucial to ensure that the additional code integrates seamlessly with the current architecture without affecting the primary functionalities when the regression head is not used.

  4. Testing: Adequate testing is necessary to confirm that your extension operates as intended and does not introduce any issues.

If you've considered these points and have a regression head that you believe could be broadly useful and well-integrated within YOLOv8, we would indeed consider reviewing a pull request.

Please make sure any contribution aligns with the project's design principles and coding standards. You can start by opening an issue or discussion on the repo to describe your proposed feature and its potential applications. That's the best place to begin a conversation on whether and how to integrate such a feature.

We appreciate your enthusiasm and willingness to contribute. It's the community's collective effort that keeps pushing the boundaries of what's possible in this fast-moving field of computer vision! 🌟

ThomasRochefortB commented 9 months ago

@glenn-jocher Cool! Would love to do this. There seems to be some CV benchmark that we could use to validate the implementation. See here the 2012 Object Detection and Orientation Estimation on the KITTI Dataset: https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. I will be closing this issue and opening a new one with the label "enhancement".

glenn-jocher commented 9 months ago

@ThomasRochefortB, that sounds like a wonderful plan! The KITTI benchmark is a well-regarded dataset and challenge in the field of computer vision, particularly for tasks that involve object detection with additional parameters like orientation, which is a good fit for your regression head capabilities. Applying and validating new features on such widely recognized benchmarks is a strong way to demonstrate the utility and robustness of the enhancement.

Creating an "enhancement" issue to discuss this further with the community and maintainers is the appropriate next step. It allows for a more focused discussion on the feature, provides an opportunity for feedback, and helps set expectations for the contribution process.

Thank you for proactively identifying a valuable extension to YOLOv8 and for preparing to share it with the community. Your initiative reflects the spirit of open-source collaboration, and we are looking forward to seeing your proposal in detail. Feel free to reference any pilot results, design decisions, and potential integration approaches you have in mind when opening the new issue. This will greatly help others understand your vision and provide constructive feedback.

If you have any questions or need guidance as you prepare your contribution, the community is here to support you. Good luck with your enhancement proposal, and once again, we appreciate your engagement and efforts! 🚀🤝

rob-rapid-robotics commented 8 months ago

Very interested in this as well. @ThomasRochefortB could you point to your branch where you got this working? I'm very interested to take a look!

ThomasRochefortB commented 8 months ago

@rob-rapid-robotics will do in the next week!

rob-rapid-robotics commented 8 months ago

@ThomasRochefortB Was curious if you had a chance to get to this, WIP is fine just curious what you found worked! :)

ThomasRochefortB commented 7 months ago

@rob-rapid-robotics here is my current very coarse WIP branch in my forked repo: https://github.com/ThomasRochefortB/ultralytics-custom/tree/reghead

I am working on a real pull-request for the official ultralytics repo. I just need to find a good benchmark dataset combining segmentation, detection and "per-instance regression" to validate the implementation. I find the usecase to be very niche and have not found good decent benchmarks as of yet.

glenn-jocher commented 7 months ago

@ThomasRochefortB-robotics, I appreciate your interest! You can take a look at the work-in-progress on my forked repository's reghead branch. Keep in mind it's still quite rough around the edges.

I'm in the process of preparing a more polished pull request for the official Ultralytics repository. The challenge right now is identifying a suitable benchmark dataset that encompasses segmentation, detection, and per-instance regression to thoroughly validate the implementation. It's indeed a niche use case, and finding robust benchmarks has been tricky. If you have any suggestions or know of datasets that could serve this purpose, your input would be very welcome! 🛠️🔍

alsozatch commented 7 months ago

@ThomasRochefortB I'm interested in adding regression heads to YOLOv8, what you're doing seems cool. Looking at your fork I don't see documentation on the changes but could you just roughly point me towards the code sections to look at in order to test your repo? I'm guessing somewhere you modified the code that reads in labels since those now need regression values as well, so I'd need to find that in order to re-format the input data correctly.

github-actions[bot] commented 6 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐