intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
https://intel.github.io/neural-compressor/
Apache License 2.0
2.22k stars 256 forks source link

Will you support Intel Arc? #1508

Closed nathanodle closed 6 months ago

nathanodle commented 10 months ago

I’m curious if you will support Arc, neural compressor would particularly benefit those platforms! Thanks!

chensuyue commented 10 months ago

We support general quantization/inference for Arc based on IPEX and ITEX. Here is an example for IPEX: https://github.com/intel/neural-compressor/blob/master/examples/pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/ipex/README.md#2-quantization-with-xpu
Example for ITEX: https://github.com/intel/neural-compressor/blob/master/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/README.md#quantizing-the-model-on-intel-gpumandatory-to-install-itex https://github.com/intel/neural-compressor/blob/master/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/README.md#quantization-config
Quantization steps are similar to CPU, just some config difference. The complex part is how to setup a XPU working ENV with IPEX/ITEX.

nathanodle commented 10 months ago

OK thanks. In that case you may want to update the readme?

image

Thanks for the response!

thuang6 commented 6 months ago

No plan to claim formal support on ARC due to limited scope of validation. Close this issue for now.