onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Apache License 2.0
717 stars 307 forks source link

xlm-roberta and Mistral-7B take significant amounts of memory during compilation #2821

Open cjvolzka opened 2 months ago

cjvolzka commented 2 months ago

While compiling models like HuggingFace protectai/xlm-roberta-base-language-detection-onnx or mistralai/Mistral-7B-v0.1 I notice we take significantly larger amounts of memory than the entire model size during compiling.

For example, the xlm-roberta-base-language-detection-onnx is about 1.11GB but during compile time I see peaks up to 9GB of memory used by onnx-mlir, opt and llc compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --onnx-op-stats TXT.

The Mistral-7B-v0.1 model is about 29GB but during compile time I see peaks up to 70+Gb and sustained 58GB memory compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --store-constants-to-file --onnx-op-stats TXT

Is there anything that can be done to reduce the compile time memory required for these kind of models?

imaihal commented 2 months ago

@cjvolzka How can we get onnx model for Mistral-7B-v0.1 ?

cjvolzka commented 1 month ago

@imaihal Sorry, I missed your question. Below is how I generated the Mistral onnx model.

Notes:

pip install huggingface_cli optimum
huggingface-cli login
optimum-cli export onnx --model mistralai/Mistral-7B-v0.1 --framework pt --atol 0.001 --task text-generation Mistral-7B-v0.1-text-generation