Interested in YOLOv6 Addition?

SangbumChoi commented 8 months ago

Model description

Hi, transformers team my question is very simple. Does team is interested in implementing YOLOv6?

I have finished making inference pipeline and working on training pipeline. https://github.com/SangbumChoi/transformers/tree/yolov6 Currently, it might occur little bug and unpretty but it works. I will continue to make it regardless of whether it is goint to be officially implemented or not.

from transformers import Yolov6Model, Yolov6ForObjectDetection
from transformers import Yolov6Config
import io
import requests
from PIL import Image
import torch
import numpy
from transformers.image_transforms import center_to_corners_format
from transformers import Yolov6ImageProcessor
from torchvision.ops.boxes import batched_nms

object_model = Yolov6ForObjectDetection.from_pretrained("superb-ai/yolov6n").cuda()
object_model.eval()

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = Yolov6ImageProcessor()
inputs = image_processor(images=image, size={"shortest_edge": 640, "longest_edge": 640}, return_tensors="pt")
label = False
if label:
    n_targets = 8
    batch_size = 1
    torch_device = 'cuda'
    labels = []
    for i in range(batch_size):
        target = {}
        target["class_labels"] = torch.ones(
            size=(n_targets,), device=torch_device, dtype=torch.long
        )
        target["boxes"] = torch.rand(
            n_targets, 4, device=torch_device, dtype=torch.float
        )
        labels.append(target)
    inputs['labels'] = labels
inputs["pixel_values"] = inputs["pixel_values"].cuda()

outputs = object_model(**inputs)

out_logits, out_bbox = outputs.logits, outputs.pred_boxes
batch_size, num_queries, num_labels = out_logits.shape

prob = out_logits.sigmoid()

all_scores = prob.reshape(batch_size, -1).to(out_logits.device)
all_indexes = torch.arange(num_queries * num_labels)[None].repeat(batch_size, 1).to(out_logits.device)
all_boxes = torch.div(all_indexes, out_logits.shape[2], rounding_mode="floor")
all_labels = all_indexes % out_logits.shape[2]

boxes = center_to_corners_format(out_bbox)
boxes = torch.gather(boxes, 1, all_boxes.unsqueeze(-1).repeat(1, 1, 4))
nms_threshold = 0.7
threshold = 0.3
results = []
for b in range(batch_size):
    box = boxes[b]
    score = all_scores[b]
    lbls = all_labels[b]

    # apply NMS
    keep_inds = batched_nms(box, score, lbls, nms_threshold)[:100]
    score = score[keep_inds]
    lbls = lbls[keep_inds]
    box = box[keep_inds]

    results.append(
        {
            "scores": score[score > threshold],
            "labels": lbls[score > threshold],
            "boxes": box[score > threshold],
        }
    )

import matplotlib.pyplot as plt

# colors for visualization
COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
          [0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]

def plot_results(pil_img, scores, labels, boxes):
    plt.figure(figsize=(16,10))
    plt.imshow(pil_img)
    ax = plt.gca()
    colors = COLORS * 100
    for score, label, (xmin, ymin, xmax, ymax),c  in zip(scores.tolist(), labels.tolist(), boxes.tolist(), colors):
        ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
                                   fill=False, color=c, linewidth=3))
        text = f'{object_model.config.id2label[label]}: {score:0.2f}'
        ax.text(xmin, ymin, text, fontsize=15,
                bbox=dict(facecolor='yellow', alpha=0.5))
    plt.axis('off')
    plt.show()

# postprocess model outputs
width, height = image.size
result = results[0]
plot_results(image, result['scores'], result['labels'], result['boxes'])

스크린샷 2024-01-11 오후 10 01 31

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

https://huggingface.co/superb-ai/yolov6n

NielsRogge commented 8 months ago

Hi @SangbumChoi thanks for this amazing draft!

Is the YOLOv6 model trained from scratch or are you using any pre-trained weights?

Regarding the design, that looks great already, however we would need to include the preparation of the targets inside the image processor, such that users can pass images + targets to it, such that the image processor then outputs a BatchFeature containing both pixel_values and labels.

Same goes for the postprocessing, we would need to make it conform our existing models, meaning that a post_process_object_detection method would need to be implemented. This would also allow the model to be compatible with the pipeline API, making sure it will work with the inference widgets on the hub, etc.

I'll discuss with team regarding the addition of the model :)

SangbumChoi commented 8 months ago

Hi @NielsRogge!

Current pipeline of model yolov6n and yolov6s is all public pre-trained weights and I also tested tolerence (1e-3 rather than 1e-4). You may check [convert_yolov6_to_pytorch.py] (https://github.com/SangbumChoi/transformers/blob/yolov6/src/transformers/models/yolov6/convert_yolov6_to_pytorch.py) however you might need to install YOLOv6 and unwrap the model.pt and store pure stated_dict since it is wrapped with the python class yolo.

https://github.com/meituan/YOLOv6/releases/download/0.4.0/yolov6n.pt

Also I think understand more than 90% of whole pipeline in transformers. So currently i'm working on training pipeline and also including minority features for BatchFeature, post_process_object_detection, and etc...

I think perfect or PR-ready version might take some more time but feel free to discuss with your team and happy to get feedbacks of mandatory requirements 😄

huggingface / transformers