Open cjvolzka opened 2 months ago
@cjvolzka How can we get onnx model for Mistral-7B-v0.1
?
@imaihal Sorry, I missed your question. Below is how I generated the Mistral onnx model.
Notes:
huggingface-cli
comand will ask a couple of questions:
pip install huggingface_cli optimum
huggingface-cli login
optimum-cli export onnx --model mistralai/Mistral-7B-v0.1 --framework pt --atol 0.001 --task text-generation Mistral-7B-v0.1-text-generation
While compiling models like HuggingFace protectai/xlm-roberta-base-language-detection-onnx or mistralai/Mistral-7B-v0.1 I notice we take significantly larger amounts of memory than the entire model size during compiling.
For example, the xlm-roberta-base-language-detection-onnx is about 1.11GB but during compile time I see peaks up to 9GB of memory used by
onnx-mlir
,opt
andllc
compiling with--O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --onnx-op-stats TXT
.The Mistral-7B-v0.1 model is about 29GB but during compile time I see peaks up to 70+Gb and sustained 58GB memory compiling with
--O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --store-constants-to-file --onnx-op-stats TXT
Is there anything that can be done to reduce the compile time memory required for these kind of models?