xlm-roberta and Mistral-7B take significant amounts of memory during compilation

cjvolzka commented 2 months ago

While compiling models like HuggingFace protectai/xlm-roberta-base-language-detection-onnx or mistralai/Mistral-7B-v0.1 I notice we take significantly larger amounts of memory than the entire model size during compiling.

For example, the xlm-roberta-base-language-detection-onnx is about 1.11GB but during compile time I see peaks up to 9GB of memory used by onnx-mlir, opt and llc compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --onnx-op-stats TXT.

The Mistral-7B-v0.1 model is about 29GB but during compile time I see peaks up to 70+Gb and sustained 58GB memory compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --store-constants-to-file --onnx-op-stats TXT

Is there anything that can be done to reduce the compile time memory required for these kind of models?

imaihal commented 2 months ago

@cjvolzka How can we get onnx model for Mistral-7B-v0.1 ?

cjvolzka commented 1 month ago

@imaihal Sorry, I missed your question. Below is how I generated the Mistral onnx model.

Notes:

I exported the model using my Mac as the tools don't support s390x. Afterward, I transferred the folder it created (with the onnx file and constants) to the s390x host to compile the model.
the huggingface-cli comand will ask a couple of questions:
- Use https://huggingface.co/settings/tokens to generate the token it requests
- You don't need to add the token as a git credential

pip install huggingface_cli optimum
huggingface-cli login
optimum-cli export onnx --model mistralai/Mistral-7B-v0.1 --framework pt --atol 0.001 --task text-generation Mistral-7B-v0.1-text-generation

onnx / onnx-mlir

xlm-roberta and Mistral-7B take significant amounts of memory during compilation #2821