ultralytics / yolov5

YOLOv5 šŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.87k stars 16.38k forks source link

Is it possible adjust the conf value for each class in yolov5? #7948

Closed chejungsong closed 2 years ago

chejungsong commented 2 years ago

Search before asking

Question

Hello. I'm working on a project to take an image with a camera using yolov5 and divide the image into 5 class. My classes are ['p1', 'p2', 'p3', 'p4', 'np']. And I want to detect the image so that it becomes 'np' when it is not 'p1'~'p4' when it comes into the model. As far as I know, if you adjust the --conf value, only object whose reliability is higher than the conf value are recognized. So, if the --conf of the recognition result falls below a certain value, can we detect the result to be 'np'? this is my detect.py code.

`

detect.py code

    # Load model
    device = select_device('')
    model = DetectMultiBackend('best.pt', device=device, dnn=False, data='v.yaml', fp16=False)
    stride, names, pt = model.stride, model.names, model.pt
    imgsz = check_img_size((640, 640), s=stride)  # check image size

    # Dataloader
    dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt)
    bs = 1  # batch_size
    vid_path, vid_writer = [None] * bs, [None] * bs
    client = mqtt.Client()

    client.on_connect = on_connect
    client.on_disconnect = on_disconnect
    client.on_subscribe = on_subscribe
    client.on_message = on_message
    client.on_publish = on_publish

    client.connect('localhost', 1883)
    client.subscribe('topic', 1)
    client.loop_forever()

    print("Object detection starting...")
    read_cam() #take a picture
    #object_detection code
    #detect_pet()
    #pde.run()
    ##############################################################################

    # Run inference
    model.warmup(imgsz=(1 if pt else bs, 3, *imgsz))  # warmup
    dt, seen = [0.0, 0.0, 0.0], 0
    for path, im, im0s, vid_cap, s in dataset:
        t1 = time_sync()
        im = torch.from_numpy(im).to(device)
        im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
        im /= 255  # 0 - 255 to 0.0 - 1.0
        if len(im.shape) == 3:
            im = im[None]  # expand for batch dim
        t2 = time_sync()
        dt[0] += t2 - t1

        # Inference
        visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if False else False
        pred = model(im, augment=False, visualize=False)
        t3 = time_sync()
        dt[1] += t3 - t2

        # NMS
        pred = non_max_suppression(pred, 0.25, 0.45, None, False, max_det=1000)
        dt[2] += time_sync() - t3

        # Second-stage classifier (optional)
        # pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)

        # Process predictions
        for i, det in enumerate(pred):  # per image
            seen += 1
            if webcam:  # batch_size >= 1
                p, im0, frame = path[i], im0s[i].copy(), dataset.count
                s += f'{i}: '
            else:
                p, im0, frame = path, im0s.copy(), getattr(dataset, 'frame', 0)

            p = Path(p)  # to Path
            #save_path = str(save_dir / p.exp)  # im.jpg
            txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # im.txt
            s += '%gx%g ' % im.shape[2:]  # print string
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            imc = im0.copy() if False else im0  # for save_crop
            annotator = Annotator(im0, line_width=3, example=str(names))
            if len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_coords(im.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    if True:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if False else (cls, *xywh)  # label format
                        with open(txt_path + '.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + '\n')

                    if save_img:  # Add bbox to image
                        c = int(cls)  # integer class
                        label = None if False else (names[c] if False else f'{names[c]} {conf:.2f}')
                        annotator.box_label(xyxy, label, color=colors(c, True))

            # Stream results
            im0 = annotator.result()

            # Save results (image with detections)
            #if save_img:
            #    if dataset.mode == 'image':
            #        cv2.imwrite(save_path, im0)
            #    else:  # 'video' or 'stream'
            #        if vid_path[i] != save_path:  # new video
            #            vid_path[i] = save_path
            #            if isinstance(vid_writer[i], cv2.VideoWriter):
            #                vid_writer[i].release()  # release previous video writer
            #            if vid_cap:  # video
            #                fps = vid_cap.get(cv2.CAP_PROP_FPS)
            #                w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
            #                h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
            #            else:  # stream
            #                fps, w, h = 30, im0.shape[1], im0.shape[0]
            #            save_path = str(Path(save_path).with_suffix('.mp4'))  # force *.mp4 suffix on results videos
            #            vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
            #        vid_writer[i].write(im0)

        # Print time (inference-only)
        LOGGER.info(f'{s}Done. ({t3 - t2:.3f}s)')

    # Print results
    t = tuple(x / seen * 1E3 for x in dt)  # speeds per image
    LOGGER.info(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}' % t)
    if True or save_img:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if True else ''
        LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}{s}")
    ################################################################################
    sortingOutput=1
    print("Done!:",sortingOutput)

`

Additional

No response

github-actions[bot] commented 2 years ago

šŸ‘‹ Hello @chejungsong, thank you for your interest in YOLOv5 šŸš€! Please visit our ā­ļø Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a šŸ› Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ā“ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@chejungsong šŸ‘‹ Hello! Thanks for asking about handling inference results. YOLOv5 šŸš€ PyTorch Hub models allow for simple model loading and inference in a pure python environment without using detect.py.

Simple Inference Example

This example loads a pretrained YOLOv5s model from PyTorch Hub as model and passes an image for inference. 'yolov5s' is the YOLOv5 'small' model. For details on all available models please see the README. Custom models can also be loaded, including custom trained PyTorch models and their exported variants, i.e. ONNX, TensorRT, TensorFlow, OpenVINO YOLOv5 models.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # yolov5n - yolov5x6 official model
#                                            'custom', 'path/to/best.pt')  # custom model

# Images
im = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, URL, PIL, OpenCV, numpy, list

# Inference
results = model(im)

# Results
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.

results.xyxy[0]  # im predictions (tensor)
results.pandas().xyxy[0]  # im predictions (pandas)
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

See YOLOv5 PyTorch Hub Tutorial for details.

Good luck šŸ€ and let us know if you have any other questions!

github-actions[bot] commented 2 years ago

šŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 šŸš€ resources:

Access additional Ultralytics āš” resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 šŸš€ and Vision AI ā­!

JoseRegoSeitech commented 1 year ago

I am working on this improvement. I want to set different conf threshold for each classes but I fail every time.

I have two classes: 0 for dog and 1 for cat. I want to set a conf threshold for dog at 0.7 and for cat at 0.4

I made this modifications in the code: detect.py `def run( classes_conf=None)

with dt[2]: pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det, classes_conf=classes_conf)

def parse_opt(): parser = argparse.ArgumentParser() parser.add_argument('--classes-conf', nargs='+', type=float, help='conf thres by class: --classes-conf 0.5, 0.3')`

general.py `def non_max_suppression( classes_conf = None, # class-specific thresholds ):

    if multi_label:
        i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
        x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
        print('here')
    else:  # best class only
        conf, j = x[:, 5:mi].max(1, keepdim=True)

        # Apply class-specific thresholds
        if classes_conf is not None:
            x_class = []  # list to hold tensors for each class
            for cls_index, cls_thresh in enumerate(classes_conf):
                mask_cls = j.view(-1) == cls_index
                mask_thresh = conf.view(-1) >= cls_thresh
                mask = mask_cls & mask_thresh
                x_class.append(x[mask])  # add tensor of detections for this class
                print('here2')

            if len(x_class) > 0:  # check that there is at least one detection
                x = torch.cat(x_class, 0)  # concatenate tensors to create final tensor of detections
                print('here3')
            else:
                x = torch.zeros((0, 7), device=x.device)  # create empty tensor if no detections
                print('here4')
        else:
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]
            print('here5')`

But it does not work because it is only taking the first label (dog)

glenn-jocher commented 1 year ago

@JoseRegoSeitech it looks like you're trying to set different confidence thresholds for each class in YOLOv5. However, you mentioned that it is only taking the first label (dog).

Based on your code modifications, it seems that you are passing the classes_conf argument correctly to the non_max_suppression function. However, the issue might be with how you are filtering the detections based on the class-specific thresholds.

In the non_max_suppression function, you are currently looping over each class and creating a list x_class to hold the tensors for each class. However, you are not concatenating these tensors correctly.

To fix this issue, you can update your code as follows:

if classes_conf is not None:
    x_class = []  # list to hold tensors for each class
    for cls_index, cls_thresh in enumerate(classes_conf):
        mask_cls = j.view(-1) == cls_index
        mask_thresh = conf.view(-1) >= cls_thresh
        mask = mask_cls & mask_thresh
        x_cls = torch.cat((box[mask], conf[mask], j[mask].float(), mask[mask]), 1)
        x_class.append(x_cls)  # add tensor of detections for this class

    if len(x_class) > 0:  # check that there is at least one detection
        x = torch.cat(x_class, 0)  # concatenate tensors to create the final tensor of detections
    else:
        x = torch.zeros((0, 7), device=x.device)  # create an empty tensor if no detections

This code snippet ensures that the tensors for each class are concatenated correctly, resulting in the desired behavior of setting different confidence thresholds for each class.

Hope this helps! Let me know if you have any further questions.

JoseRegoSeitech commented 1 year ago

Thanks for showing interest on this old issue!!

I got the solution in this way but can you check it out?

general.py `

 else:  # best class only
        conf, j = x[:, 5:mi].max(1, keepdim=True)

        # Create the masks according to the conditions
        mask_j0 = (j == 0) & (conf > 0.5)  # Mask where j is 0 and conf is greater than 0.5
        mask_j1 = (j == 1) & (conf > 0.1)  # Mask where j is 1 and conf is greater than 0.1
        final_mask = mask_j0 | mask_j1  # Combine the two masks

        # Replace conf > conf_thres with final_mask
        x = torch.cat((box, conf, j.float(), mask), 1)[final_mask.flatten()]

`

I would like to add conf values into the command arg so that I do not need to change the conf values in general.py

glenn-jocher commented 1 year ago

@JoseRegoSeitech Sure! Here's an updated version of your code snippet that includes the conf values as command-line arguments:

general.py

def non_max_suppression(
    classes_conf=None,  # class-specific thresholds
):

    if multi_label:
        i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
        x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
    else:  # best class only
        conf, j = x[:, 5:mi].max(1, keepdim=True)

        if classes_conf is not None:
            conf_thres_list = classes_conf.split(",")  # split conf values by comma
            conf_thres_list = [float(thres) for thres in conf_thres_list]  # convert conf values to floats

            # Create the masks according to the conditions
            mask_cls = torch.zeros_like(conf, dtype=torch.bool)
            for cls_index, cls_thresh in enumerate(conf_thres_list):
                mask_cls_cls = (j[..., 0] == cls_index) & (conf[..., 0] > cls_thresh)
                mask_cls = mask_cls | mask_cls_cls

            mask = conf[..., 0] > conf_thres  # original conf threshold mask
            final_mask = mask & mask_cls  # combine the original and class-specific masks

            x = torch.cat((box, conf, j.float(), mask), 1)[final_mask.flatten()]
        else:
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

With this modification, you can specify the class-specific confidence thresholds as command-line arguments. For example:

python detect.py --conf-thres 0.5 --classes-conf "0.7,0.4"

In this example, --conf-thres sets the general confidence threshold to 0.5, and --classes-conf sets the class-specific confidence thresholds to 0.7 for class 0 (dog) and 0.4 for class 1 (cat).

By adding this functionality, you can now change the confidence thresholds without modifying the code in general.py.

Hope this helps! Let me know if you have any further questions.

geolvr commented 1 year ago

@glenn-jocher glenn-jocher @JoseRegoSeitech Sure! Here's an updated version of your code snippet that includes the conf values as command-line arguments:

general.py

def non_max_suppression(
    classes_conf=None,  # class-specific thresholds
):

    if multi_label:
        i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
        x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
    else:  # best class only
        conf, j = x[:, 5:mi].max(1, keepdim=True)

        if classes_conf is not None:
            conf_thres_list = classes_conf.split(",")  # split conf values by comma
            conf_thres_list = [float(thres) for thres in conf_thres_list]  # convert conf values to floats

            # Create the masks according to the conditions
            mask_cls = torch.zeros_like(conf, dtype=torch.bool)
            for cls_index, cls_thresh in enumerate(conf_thres_list):
                mask_cls_cls = (j[..., 0] == cls_index) & (conf[..., 0] > cls_thresh)
                mask_cls = mask_cls | mask_cls_cls

            mask = conf[..., 0] > conf_thres  # original conf threshold mask
            final_mask = mask & mask_cls  # combine the original and class-specific masks

            x = torch.cat((box, conf, j.float(), mask), 1)[final_mask.flatten()]
        else:
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

With this modification, you can specify the class-specific confidence thresholds as command-line arguments. For example:

python detect.py --conf-thres 0.5 --classes-conf "0.7,0.4"

In this example, --conf-thres sets the general confidence threshold to 0.5, and --classes-conf sets the class-specific confidence thresholds to 0.7 for class 0 (dog) and 0.4 for class 1 (cat).

By adding this functionality, you can now change the confidence thresholds without modifying the code in general.py.

Hope this helps! Let me know if you have any further questions.

I tried your solution but received the following error:

x = torch.cat((box, conf, j.float(), mask), 1)[final_mask.flatten()]  #  This line triggers the error

RuntimeError: Tensors must have same number of dimensions: got 1 and 2

glenn-jocher commented 1 year ago

@geolvr it seems that the error is occurring due to the concatenation of tensors on the specified line. The error message "Tensors must have same number of dimensions: got 1 and 2" indicates a mismatch between the dimensions of the tensors being concatenated.

To resolve this issue, it's important to ensure that the tensors being concatenated have compatible dimensions. You may need to inspect the shapes of the tensors involved in that concatenation operation and verify that they align properly.

Additionally, you might want to recheck the slicing and indexing operations being performed on the tensors to ensure that the dimensions are maintained correctly. Consider debugging the line and inspecting the individual shapes of box, conf, j.float(), and mask to identify any discrepancies in their dimensions.

Feel free to further clarify the context or provide more specifics about the tensor shapes involved for a more targeted assistance.

geolvr commented 1 year ago

@geolvr it seems that the error is occurring due to the concatenation of tensors on the specified line. The error message "Tensors must have same number of dimensions: got 1 and 2" indicates a mismatch between the dimensions of the tensors being concatenated.

To resolve this issue, it's important to ensure that the tensors being concatenated have compatible dimensions. You may need to inspect the shapes of the tensors involved in that concatenation operation and verify that they align properly.

Additionally, you might want to recheck the slicing and indexing operations being performed on the tensors to ensure that the dimensions are maintained correctly. Consider debugging the line and inspecting the individual shapes of box, conf, j.float(), and mask to identify any discrepancies in their dimensions.

Feel free to further clarify the context or provide more specifics about the tensor shapes involved for a more targeted assistance.

@glenn-jocher

My number of class is 19. Through single-step debugging, the shapes of variable box, conf, j, and mask are (38, 4), (38, 1), (38, 1), and (38,) respectively, before the error line is executed. Obviously, since the variable mask has only one dimension, the concatenation cannot be performed. However, since I can't understand all of your code, including the meaning of the 4 variables, and why their dimensions are the way they are. For example, 38 is twice as many as 19? I need your further help. I request that you simply provide code that works perfectly. Thank you very much.

glenn-jocher commented 1 year ago

@geolvr,

It appears that the issue is stemming from a dimension mismatch during tensor concatenation. Based on the provided context, it seems that the dimensions of the variables box, conf, j, and mask are (38, 4), (38, 1), (38, 1), and (38,) respectively. The error occurs because mask has only one dimension, leading to the concatenation failure.

In addition, there seems to be uncertainty regarding the functionality of the variables and their specific dimensions. For instance, the dimension of 38 raises questions given that you have 19 classes. Further clarification on the purpose and origin of these variables would be beneficial in providing more targeted assistance.

Your request for a refined code solution is acknowledged, and I will work on further addressing the issue with a more comprehensive explanation. Thank you for your understanding.