microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.14k stars 2.85k forks source link

ONNXRuntimeError: Training mode does not support BN opset 14 (or higher) yet. #16867

Open BowenBao opened 1 year ago

BowenBao commented 1 year ago

From bench

[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/providers/cuda/nn/batch_norm.h:43 onnxruntime::cuda::BatchNorm::BatchNorm(const onnxruntime::OpKernelInfo&) [with T = float] !(is_trainingmode && opset >= 14) was false. Training mode does not support BN opset 14 (or higher) yet.

        timm_nfnet
        dm_nfnet_f0
        nfnet_l0
        pytorch_CycleGAN_and_pix2pix

Bench is run in eval mode. This is the same issue as https://github.com/pytorch/pytorch/issues/75252 from old exporter. We need to revisit how training attribute should be handled, specifically for BatchNorm, and instanceNorm.

BowenBao commented 1 year ago

draft https://github.com/microsoft/onnxruntime/pull/16551

BowenBao commented 1 year ago

Doesn't have bandwidth to investigate ORT cuda kernel number mismatch.

Will unblock bench by running in CPU.

cabinader commented 7 months ago

Hi, I wanted to know if there are updates regarding this issue. I faced it too on my side and couldn't find a way to solve it.

I'm exporting a standard nnUNet model (https://github.com/MIC-DKFZ/nnUNet/tree/master/nnunetv2) in onnx. When I run inference on CPU everything works perfectly. However, i have this error when I run it on GPU. " RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/providers/cuda/nn/batch_norm.h:43 onnxruntime::cuda::BatchNorm::BatchNorm(const onnxruntime::OpKernelInfo&) [with T = float] !(is_trainingmode && opset >= 14) was false. Training mode does not support BN opset 14 (or higher) yet." Somethin strange being that there are no batch norm operation in my network but rather instance norm so I would expect onnx to raise an error with respect to this operation. If someone has a quick fix for this I would greatly appreciate. Thanks !

BowenBao commented 7 months ago

Hi @cabinader, we are working on a solution to solve it from exporter side. This is the tracking issue for it https://github.com/microsoft/onnxscript/issues/1262

Please note that this fix will only be available on the new dynamo based onnx exporter via torch.onnx.dynamo_export api.

cabinader commented 7 months ago

Thanks @BowenBao for your answer. Just a last question. Do you have an estimated timeline in mind for this release of the new onnx exporter ? Many thanks !

BowenBao commented 6 months ago

Pull request in pytorch https://github.com/pytorch/pytorch/pull/120866.

It should be available in the next pytorch release (2.3) planned around April/May, and soon in nightly.

thiagocrepaldi commented 4 months ago

is this resolved?