intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
https://intel.github.io/neural-compressor/
Apache License 2.0
2.23k stars 256 forks source link

Some bugs when doing mobilenetv3 onnxrt quantize using pytorch dataloader #771

Closed dianyo closed 1 year ago

dianyo commented 1 year ago

Hi, As the onnxruntime cannot take torch.Tensor as input, we should add a type check before we feed the dataloader's output to onnxruntime inference session. I also found another bug when assigning the providers. I'm unsure how to create a pull request as I don't have permission to upload my fixed branch. I just shared with you where the code I've modified as follows:

neural_compressor/adaptor/ox_utils/calibration.py

import torch # add torch for torch.Tensor
...
# line 235
if isinstance(inputs, torch.Tensor):
    inputs = inputs.numpy()

...
# line 217 add "s" to "provider" argument
session = onnxruntime.InferenceSession(
                    self.augmented_model.SerializeToString(),
                    so,
                    providers=self.backend) if not self.model_wrapper.is_large_model else \
                    onnxruntime.InferenceSession(
                    self.model_wrapper.model_path  + '_augment.onnx',
                    so,
                    providers=self.backend)

neural_compressor/adaptor/onnxrt.py

import torch # add torch for torch.Tensor
...
# line 1210
if isinstance(inputs, torch.Tensor):
    inputs = inputs.numpy()

If there's any way for me to create a PR, I'd like to do so! Thank you!

chensuyue commented 1 year ago

Welcome to submit a PR directly to this repo, we will review and test your PR.

mengniwang95 commented 1 year ago

Hi @dianyo , thank you for your suggestions For neural_compressor/adaptor/ox_utils/calibration.py, that's indeed our negligence. You can create a branch on this repo and push your local branch to it then create a PR. As for neural_compressor/adaptor/onnxrt.py, users need to create a dataloder or evaluation function to get accuracy, so we assume that users can convert Torch.Tensor to numpy array by themselves. Could you please provide an example to show the necessity that we should do it internally?

dianyo commented 1 year ago

Hi @mengniwang95, I assume that the dataloader argument in the evaluation function is compatible with Pytorch dataloader as shown in your example. Therefore, I suggest that you either write a document to state that using onnx model needs the implementation of your dataloader instead of using Pytorch dataloader directly or implement the Torch.Tensor/numpy conversion code in onnrt.py. What do you think?

mengniwang95 commented 1 year ago

Hi @dianyo , thank you for your suggestions, we plan to add Torch.Tensor to numpy conversion into onnxrt adaptor 😄

mengniwang95 commented 1 year ago

implemented in commit a2931eaa4052eec195be3c79a13f7bfa23e54473