Hello. I'm using deberta-v3-base for a text classification task. After training I'm converting a pytorch model to ONNX format. Everything works like a charm except that the size of the model is twice the size of the original DeBERTa - ~750MB. Because of it I want to convert it with mixed precision, i.e. fp16. I tried two approaches:
But in both cases I get this error during inference on CPU:
2023-01-06 10:46:46.332352649 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence can't constant fold LayerNormalization node 'LayerNorm_1'
2023-01-06 10:46:46.414666254 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence can't constant fold LayerNormalization node 'LayerNorm_1'
2023-01-06 10:46:46.425605272 [W:onnxruntime:, constant_folding.cc:179 ApplyImpl] Could not find a CPU kernel and hence can't constant fold LayerNormalization node 'LayerNorm_1'
I also tried to set use_gpu=True in optimize_model method. Errors disappeared, but the inference time was 3-4 time slower.
Hello. I'm using
deberta-v3-base
for a text classification task. After training I'm converting a pytorch model to ONNX format. Everything works like a charm except that the size of the model is twice the size of the original DeBERTa - ~750MB. Because of it I want to convert it with mixed precision, i.e. fp16. I tried two approaches:model.half()
before ONNX conversionBut in both cases I get this error during inference on CPU:
I also tried to set
use_gpu=True
in optimize_model method. Errors disappeared, but the inference time was 3-4 time slower.