Failed to export a YOLOv8-det model to onnx with fp16.

LorenzoSun-V commented 2 weeks ago

Search before asking

[X] I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

No response

Bug

I used the command to export onnx with fp16:

yolo export model=/path/to/my/best.pt format=onnx imgsz=640 half=True

But it seems to be fp32 when I open it by Netron: In addition, the size of exported onnx model doesn't get smaller. Do you have any ideas about this phenomenon?

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 2 weeks ago

👋 Hello @LorenzoSun-V, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 weeks ago

Hello! Thanks for reaching out. It appears that you encountered an issue where the exported ONNX model remains in FP32 despite specifying FP16. This can happen since some model components might not fully support FP16, or implementation may default back to FP32 for stability.

It's essential to verify that all components of your model are compatible with FP16. For now, please ensure you're using the latest version of the export tool and YOLOv8, as updates might contain fixes or enhancements for such features.

If all else seems correct, please consider opening an issue with more detailed environment information so we can help diagnose the problem more effectively!

LorenzoSun-V commented 2 weeks ago

Hi, the ultralytics pkg version in my environment is 8.2.15:

And my train file is:

import argparse
from ultralytics import YOLO

def parse_opt():
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default='yolov8s.pt', help='path to model file, i.e. yolov8n.pt, yolov8n.yaml')
    parser.add_argument('--data', type=str, default='ultralytics/cfg/datasets/coco128.yaml', help='dataset.yaml path')
    parser.add_argument('--epochs', type=int, default=300)
    parser.add_argument('--batch', type=int, default=16, help='total batch size for all GPUs')
    parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers')
    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')
    parser.add_argument('--device', default=None, help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--project', default='runs/train', help='save to project/name')
    parser.add_argument('--name', default='exp', help='save to project/name')
    parser.add_argument('--patience', type=int, default=50, help='epochs to wait for no observable improvement for early stopping of training')
    parser.add_argument('--close-mosaic', type=int, default=10, help='(int) disable mosaic augmentation for final epochs (0 to disable)')
    parser.add_argument('--resume', action='store_true', help='resume training from last checkpoint')
    parser.add_argument('--lr0', type=float, default=0.01, help='initial learning rate (i.e. SGD=1E-2, Adam=1E-3)')
    parser.add_argument('--lrf', type=float, default=0.01, help='final learning rate (lr0 * lrf)')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--box', type=float, default=7.5, help='box loss gain')
    parser.add_argument('--cls', type=float, default=0.5, help='cls loss gain (scale with pixels)')
    parser.add_argument('--dfl', type=float, default=1.5, help='dfl loss gain')
    opt = parser.parse_args()
    opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1  # expand
    return opt

def main(opt):
    model = YOLO(f"{opt.weights}")

    model.train(data=opt.data, 
                epochs=opt.epochs, 
                batch=opt.batch,
                workers=opt.workers,
                imgsz=opt.imgsz,
                device=opt.device,
                project=opt.project,
                name=opt.name,
                patience=opt.patience,
                close_mosaic=opt.close_mosaic,
                resume=opt.resume,
                lr0=opt.lr0,
                lrf=opt.lrf,
                exist_ok=opt.exist_ok,
                box=opt.box,
                cls=opt.cls,
                dfl=opt.dfl
    )

if __name__ == "__main__":
    opt = parse_opt()
    main(opt)

The train script is:

python tools/train.py \
    --weights ${pretrained} \
    --data ${data} \
    --epochs ${epochs} \
    --batch ${batch} \
    --imgsz ${imgsz} \
    --device ${device} \
    --project ${project} \
    --name ${name} \
    --patience 50

I used yolo export model=/path/to/my/best.pt format=onnx imgsz=640 half=True to export onnx.

It seems that everything is right but I don't get expected results.

LorenzoSun-V commented 2 weeks ago

The versions of related onnx pkgs are showed as following:

glenn-jocher commented 2 weeks ago

@LorenzoSun-V hello! Thank you for sharing the version details of your ONNX packages. From the versions you've listed, it seems your setup should be compatible with exporting ONNX models in FP16. However, if the export still doesn’t reflect FP16 precision, you might want to ensure that all the operations in your model are supported in FP16 on ONNX. Sometimes, specific operations can cause the model to revert to FP32.

Additionally, please double-check the export command for any typos, especially around the half=True argument (it should be half=true in lowercase) to guarantee that FP16 is being requested correctly.

If issues persist, this might be specific to how ONNX handles FP16 conversion internally for certain operations. Upgrading to the latest version of the ONNX library might also help if not already using it.

Hope this helps! 😊

LorenzoSun-V commented 1 week ago

Hi, I figure out the points failing to export fp16 onnx model. In export code:

So I used yolo export model=/path/to/my/best.pt format=onnx imgsz=640 half=true device=0 to export fp16 onnx model successfully.

Hence I encounter another qusetion, can this fp16 onnx model run on CPU?

LorenzoSun-V commented 1 week ago

I use this example to infer image by FP16 onnx model successfully on CPU. But failed on CUDA:

glenn-jocher commented 1 week ago

Hello! It's great to hear that you've successfully run the FP16 ONNX model on the CPU. Regarding the issue with CUDA, it seems like there might be a compatibility issue with FP16 and your CUDA setup.

Could you please ensure that your GPU supports FP16 operations and that the latest drivers and CUDA versions are installed? Sometimes, updating these can resolve such issues.

If the problem persists, consider using the FP32 model for CUDA inference, as it generally has broader support across different GPU architectures. Here's a quick example of how to force FP32 during the export if needed:

yolo export model=/path/to/my/best.pt format=onnx imgsz=640 half=false device=0

Hope this helps! 😊 Let us know how it goes!

LorenzoSun-V commented 1 week ago

Hi, the code in example doesn't include letterbox so the results of ONNX are quite different from those of pt. The results of ONNX are much closer to those of pt after adding letterbox in preprocess function. I suppose it's better to add letterbox in preprocess function.

glenn-jocher commented 1 week ago

Hello! Thanks for pointing this out. You're absolutely right; including letterbox in the preprocessing function can help align the ONNX model's results more closely with the PyTorch model's outputs, especially since it maintains the aspect ratio of the original images.

Here's a quick example of how you might modify the preprocessing function to include letterbox:

def preprocess(img_path):
    img = Image.open(img_path)
    img = letterbox(img, new_shape=(640, 640))[0]
    img = img.transpose((2, 0, 1))  # HWC to CHW
    img = np.ascontiguousarray(img)
    return img

This should help standardize the input and improve the model's performance consistency across different formats. Thanks for your suggestion! 😊

LorenzoSun-V commented 1 week ago

Thanks for your patient reply! Everything is running correctly now.

glenn-jocher commented 1 week ago

That's great to hear! If you have any more questions or need further assistance in the future, feel free to reach out. Happy coding! 😊

ultralytics / ultralytics