Open Tangxinlu opened 6 months ago
Hey @Tangxinlu, the sparseml.export
is the appropriate pathway. Could you share your code and stack trace, so that I can reproduce the issue?
Hi @dbogunowicz, thanks for the quick reply!
Here is an example:
git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]"
huggingface-cli download TechxGenus/Meta-Llama-3-8B-GPTQ --local-dir Meta-Llama-3-8B-GPTQ
# Add `"disable_exllama": true` to `"quantization_config"` in `Meta-Llama-3-8B-GPTQ/config.json`
sparseml.export --task text-generation ./Meta-Llama-3-8B-GPTQ
Error:
...
sparseml/src/sparseml/pytorch/torch_to_onnx_exporter.py", line 100, in pre_validate
return deepcopy(module).to("cpu").eval()
...
TypeError: cannot pickle 'module' object
envs:
Thanks for the great work!
Now I have my own sparsified and GPTQ-quantized model, I'd like to run it in deepsparse to see some inference speedup or other advantages. To export it to ONNX, I tried running https://github.com/neuralmagic/sparseml/tree/main/src/sparseml/transformers/sparsification/obcq#-how-to-export-the-one-shot-model but it seems it doesn't work for GPTQ-quantized model. How do i export a GPTQ model (e.g., TheBloke/Llama-2-7B-Chat-GPTQ) to ONNX model so that it can work in DeepSparse? Thanks.