siddharth-sharma7 / fast-Bart

Convert BART models to ONNX with quantization. 3X reduction in size, and upto 3X boost in inference speed
34 stars 3 forks source link

deploy onnx model to tensorrt #1

Open will-wiki opened 2 years ago

will-wiki commented 2 years ago

Thank you very much for your work, very helpful! Now I can convert BART model to ONNX model, and the output of the two is consistent, but I would like to ask whether you have tried to deploy onNX model to Tensorrt, so far I have been able to run on Tensorrt, but the result of tensorRT deployment is not consistent with that of ONNX model; (

siddharth-sharma7 commented 2 years ago

@will-wiki - I haven't run the ONNX model on TensorRT yet. Could you share how you've implemented the same, perhaps I can help.