ultralytics / ultralytics

Ultralytics YOLO11 🚀
https://docs.ultralytics.com
GNU Affero General Public License v3.0
31.62k stars 6.07k forks source link

Thoughts on the memory occupation of the model #14905

Open JustChenk opened 2 months ago

JustChenk commented 2 months ago

Search before asking

Question

  1. Why can the video memory occupy about 1G when using a model of tens of MB?
  2. Is it possible to reduce the amount of video memory used to load the yolo model? 3.Does the model itself not contain gradient and optimizer parameters? 4.What is the relationship between the memory usage and the size of the Yolo model? 5.Can video memory usage be reduced in some way, although performance may be affected.

image

Additional

The following is the code for the invocation model.

DETECTOR_PATH = 'weights/yolov8n.pt'
logging.info(f"track_detect.py gpu: {gpu}")

def tracker_infer(weight_path):
    model = YOLO(weight_path)

    def process(frame):
        res = model.track(frame, tracker="bytetrack.yaml", device=torch.device(f"cuda:{gpu}"), verbose=False)

        detected_boxes = res[0].boxes
        pred_boxes = []

        for box in detected_boxes:
            xyxy = box.xyxy.cpu()
            confidence = box.conf.cpu().item()
            class_id = box.cls  # get the class id
            class_id_cpu = class_id.cpu()  # move the value to CPU
            class_id_int = int(class_id_cpu.item())  # convert to integer
            if class_id_int != 0:
                continue
            x1, y1, x2, y2 = xyxy[0].numpy()

            import logging
            # Added steps for handling NoneType cases
            if box.id is None:
                logging.info(f"box.id: {box.id}")
                continue
            else:
                track_id = int(box.id.cpu().item())

            pred_boxes.append(
                (x1, y1, x2, y2, class_id_int, confidence, track_id))

        return pred_boxes

    return process

v8Tracker = tracker_infer(DETECTOR_PATH)

Is it the same amount of video memory that other people use when using the model?

ambitious-octopus commented 2 months ago

Hey @JustChenk Why can the video memory occupy about 1G when using a model of tens of MB? The memory usage includes not just the model parameters but also other operations which are expanded and stored in higher precision on the GPU.

Is it possible to reduce the amount of video memory used to load the YOLO model? Yes, by using techniques like model quantization and lower precision data types (e.g., float16).

Does the model itself not contain gradient and optimizer parameters? Correct, the model file typically contains only the parameters (weights and biases), while gradients and optimizer states are computed and stored during training.

What is the relationship between the memory usage and the size of the YOLO model? The on-disk size reflects the compressed or low-precision parameters, while the in-memory size includes the expanded parameters, activations, and additional overhead.

Can video memory usage be reduced in some way, although performance may be affected? Yes, using mixed precision training, model pruning, reducing batch size, or using memory-efficient architectures can reduce memory usage but may impact performance.

Suggestion for Exporting the Model To further reduce memory usage and optimize performance, consider exporting the model to TensorRT, which can perform optimizations like precision calibration and kernel fusion, leading to reduced memory footprint and faster inference. Here is the documentation.