Multi-class pose estimation with different number of keypoints

xzxorb commented 4 months ago

Search before asking

[X] I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

Hi! I am using the yolov8-pose model for keypoints detection, I want to detect multi class and each class has a different number of keypoints, but the number of keypoints can't be different. I have tried to pad 0 for the class with less keypoints. But the result is very bad. How can I modify the code to support inputting different number of keypoints and processing them respectively? Could you give me some advice? Thank you very much.

Additional

No response

github-actions[bot] commented 4 months ago

👋 Hello @xzxorb, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 4 months ago

Hello!

Thanks for reaching out with your question! 👋 In YOLOv8 pose estimation, handling multiple classes with varying numbers of keypoints typically involves a workaround, as the model architecture expects a fixed number of keypoints per class.

One approach is to modify the model to dynamically adjust the number of output keypoints based on the class detected. This would involve significant changes to the model's architecture and possibly the training pipeline.

Alternatively, you can standardize the number of keypoints across all classes by defining a maximum number of keypoints that covers all classes. Then, for classes with fewer keypoints, you can continue padding the additional keypoints as neutral or background keypoints, ensuring they do not influence the loss significantly during training. Adjustments in the loss calculation to ignore these padded keypoints might help improve the model's performance.

Here’s a snippet idea on how you might consider adjusting the loss function to handle the padding better:

def custom_loss(y_pred, y_true):
    # Assuming y_true is adjusted to have 0s for padded keypoints
    mask = (y_true != 0)
    loss = (y_pred - y_true)**2 * mask  # Only compute loss for actual keypoints
    return loss.sum() / mask.sum()

Remember, these suggestions require fine-tuning and testing. Let us know how it goes or if you need more specific guidance!

Good luck with your project! 🚀

xzxorb commented 4 months ago

Thanks for your help, but I still don't know how to modify the loss function in code, I think the original loss function is like this, which means the mask is already set in loss function.

        if masks.any():
            gt_kpt = selected_keypoints[masks]
            area = xyxy2xywh(target_bboxes[masks])[:, 2:].prod(1, keepdim=True)
            pred_kpt = pred_kpts[masks]
            kpt_mask = gt_kpt[..., 2] != 0 if gt_kpt.shape[-1] == 3 else torch.full_like(gt_kpt[..., 0], True)
            kpts_loss = self.keypoint_loss(pred_kpt, gt_kpt, kpt_mask, area)  # pose loss

            if pred_kpt.shape[-1] == 3:
                kpts_obj_loss = self.bce_pose(pred_kpt[..., 2], kpt_mask.float())  # keypoint obj loss

        return kpts_loss, kpts_obj_loss

class KeypointLoss(nn.Module):
    def __init__(self, sigmas) -> None:
        super().__init__()
        self.sigmas = sigmas

    def forward(self, pred_kpts, gt_kpts, kpt_mask, area):
        d = (pred_kpts[..., 0] - gt_kpts[..., 0]).pow(2) + (pred_kpts[..., 1] - gt_kpts[..., 1]).pow(2)
        kpt_loss_factor = kpt_mask.shape[1] / (torch.sum(kpt_mask != 0, dim=1) + 1e-9)
        # e = d / (2 * (area * self.sigmas) ** 2 + 1e-9)  # from formula
        e = d / ((2 * self.sigmas).pow(2) * (area + 1e-9) * 2)  # from cocoeval
        return (kpt_loss_factor.view(-1, 1) * ((1 - torch.exp(-e)) * kpt_mask)).mean()

glenn-jocher commented 4 months ago

Hello!

Thanks for sharing the details of your current loss function implementation. From what I see, you're already employing a masking strategy to selectively adjust the loss computation based on the presence of keypoints (defined by kpt_mask).

If you need to further modify or customize the behavior specifically for varying numbers of keypoints across different classes, consider adjusting how kpt_mask is computed. For instance, if you have a set maximum number of keypoints and certain keypoints aren't applicable for some classes, ensure these locations in kpt_mask are always zeroed out, so they don't contribute to the loss.

Here’s a brief idea you might try:

kpt_mask = create_dynamic_mask(gt_kpt, max_keypoints, class_ids)

In this hypothetical function create_dynamic_mask, you'd dynamically generate masks based on the class, ensuring keypoints that aren't applicable are ignored in loss computation.

The goal here is to make sure the loss calculation focuses only on relevant keypoints for each class. Let me know if this helps, or if there's a specific area in the code modification that is unclear! 😊

xzxorb commented 4 months ago

Thank you, I will try this later.

glenn-jocher commented 4 months ago

You're welcome! If you have any more questions or need further assistance after trying it out, feel free to reach out. Happy coding! 😊

github-actions[bot] commented 3 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

ultralytics / ultralytics