Failed to export a YOLOv8-det model to onnx with fp16. #12721

Closed LorenzoSun-V closed 1 week ago

LorenzoSun-V commented 2 weeks ago

I used the command to export onnx with fp16:

yolo export model=/path/to/my/best.pt format=onnx imgsz=640 half=True

But it seems to be fp32 when I open it by Netron: image In addition, the size of exported onnx model doesn't get smaller. Do you have any ideas about this phenomenon?


glenn-jocher commented 2 weeks ago

Hello! Thanks for reaching out. It appears that you encountered an issue where the exported ONNX model remains in FP32 despite specifying FP16. This can happen since some model components might not fully support FP16, or implementation may default back to FP32 for stability.

It's essential to verify that all components of your model are compatible with FP16. For now, please ensure you're using the latest version of the export tool and YOLOv8, as updates might contain fixes or enhancements for such features.

If all else seems correct, please consider opening an issue with more detailed environment information so we can help diagnose the problem more effectively!

LorenzoSun-V commented 2 weeks ago

Hi, the ultralytics pkg version in my environment is 8.2.15: image

And my train file is:

import argparse
from ultralytics import YOLO

def parse_opt():
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default='yolov8s.pt', help='path to model file, i.e. yolov8n.pt, yolov8n.yaml')
    parser.add_argument('--data', type=str, default='ultralytics/cfg/datasets/coco128.yaml', help='dataset.yaml path')
    parser.add_argument('--epochs', type=int, default=300)
    parser.add_argument('--batch', type=int, default=16, help='total batch size for all GPUs')
    parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers')
    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')
    parser.add_argument('--device', default=None, help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--project', default='runs/train', help='save to project/name')
    parser.add_argument('--name', default='exp', help='save to project/name')
    parser.add_argument('--patience', type=int, default=50, help='epochs to wait for no observable improvement for early stopping of training')
    parser.add_argument('--close-mosaic', type=int, default=10, help='(int) disable mosaic augmentation for final epochs (0 to disable)')
    parser.add_argument('--resume', action='store_true', help='resume training from last checkpoint')
    parser.add_argument('--lr0', type=float, default=0.01, help='initial learning rate (i.e. SGD=1E-2, Adam=1E-3)')
    parser.add_argument('--lrf', type=float, default=0.01, help='final learning rate (lr0 * lrf)')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--box', type=float, default=7.5, help='box loss gain')
    parser.add_argument('--cls', type=float, default=0.5, help='cls loss gain (scale with pixels)')
    parser.add_argument('--dfl', type=float, default=1.5, help='dfl loss gain')
    opt = parser.parse_args()
    opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1  # expand
    return opt

def main(opt):
    model = YOLO(f"{opt.weights}")


if __name__ == "__main__":
    opt = parse_opt()

The train script is:

python tools/train.py \
    --weights ${pretrained} \
    --data ${data} \
    --epochs ${epochs} \
    --batch ${batch} \
    --imgsz ${imgsz} \
    --device ${device} \
    --project ${project} \
    --name ${name} \
    --patience 50

I used yolo export model=/path/to/my/best.pt format=onnx imgsz=640 half=True to export onnx.

It seems that everything is right but I don't get expected results.

LorenzoSun-V commented 2 weeks ago

The versions of related onnx pkgs are showed as following: image

glenn-jocher commented 2 weeks ago

@LorenzoSun-V hello! Thank you for sharing the version details of your ONNX packages. From the versions you've listed, it seems your setup should be compatible with exporting ONNX models in FP16. However, if the export still doesn’t reflect FP16 precision, you might want to ensure that all the operations in your model are supported in FP16 on ONNX. Sometimes, specific operations can cause the model to revert to FP32.

Additionally, please double-check the export command for any typos, especially around the half=True argument (it should be half=true in lowercase) to guarantee that FP16 is being requested correctly.

If issues persist, this might be specific to how ONNX handles FP16 conversion internally for certain operations. Upgrading to the latest version of the ONNX library might also help if not already using it.

Hope this helps! 😊

LorenzoSun-V commented 1 week ago

Hi, I figure out the points failing to export fp16 onnx model. In export code: image image

So I used yolo export model=/path/to/my/best.pt format=onnx imgsz=640 half=true device=0 to export fp16 onnx model successfully. image

Hence I encounter another qusetion, can this fp16 onnx model run on CPU?

LorenzoSun-V commented 1 week ago

I use this example to infer image by FP16 onnx model successfully on CPU. But failed on CUDA: image

glenn-jocher commented 1 week ago

Hello! It's great to hear that you've successfully run the FP16 ONNX model on the CPU. Regarding the issue with CUDA, it seems like there might be a compatibility issue with FP16 and your CUDA setup.

Could you please ensure that your GPU supports FP16 operations and that the latest drivers and CUDA versions are installed? Sometimes, updating these can resolve such issues.

If the problem persists, consider using the FP32 model for CUDA inference, as it generally has broader support across different GPU architectures. Here's a quick example of how to force FP32 during the export if needed:

yolo export model=/path/to/my/best.pt format=onnx imgsz=640 half=false device=0

Hope this helps! 😊 Let us know how it goes!

LorenzoSun-V commented 1 week ago

Hi, the code in example doesn't include letterbox so the results of ONNX are quite different from those of pt. The results of ONNX are much closer to those of pt after adding letterbox in preprocess function. I suppose it's better to add letterbox in preprocess function.

glenn-jocher commented 1 week ago

Hello! Thanks for pointing this out. You're absolutely right; including letterbox in the preprocessing function can help align the ONNX model's results more closely with the PyTorch model's outputs, especially since it maintains the aspect ratio of the original images.

Here's a quick example of how you might modify the preprocessing function to include letterbox:

def preprocess(img_path):
    img = Image.open(img_path)
    img = letterbox(img, new_shape=(640, 640))[0]
    img = img.transpose((2, 0, 1))  # HWC to CHW
    img = np.ascontiguousarray(img)
    return img

This should help standardize the input and improve the model's performance consistency across different formats. Thanks for your suggestion! 😊

LorenzoSun-V commented 1 week ago

Thanks for your patient reply! Everything is running correctly now.

glenn-jocher commented 1 week ago

That's great to hear! If you have any more questions or need further assistance in the future, feel free to reach out. Happy coding! 😊