Open lucasjinreal opened 2 years ago
Hi @jinfagang as far as I know, the transformers.onnx
package should work for exporting large models. The only difference is that you'll typically see a number of additional files are created because ONNX uses Protobuf and this can only serialise files in 2GB chunks.
It would be helpful to have a reproducible code example if you are having trouble exporting a particular checkpoint
@lewtun Hi, actually I am just trying using the same technique in Bert quantization inside onnxruntime example. Bert can be optimized, and I can shrink the model size from 1.6G to 400M in int8. But when I trying on 7G model, it fails. So I don't know is the optimization doesn't supported such big model, or does the huge model can not be properly loaded by ort. It just returns error like read onnx model failed.
Hey @jinfagang can you please share a code snippet that shows which checkpoint you're using and an example of how you're loading the exported model?
@lewtun Hi, I suppose you familiar with GPT2 and magnetron. Let me explain how I do it.
python -m transformers.onnx ./models
convert the model to onnx using hugging faces tools;the quantization like this:
optimization_options = FusionOptions("gpt2")
optimization_options.enable_gelu = True
optimization_options.enable_layer_norm = True
optimization_options.enable_attention = True
optimization_options.enable_skip_layer_norm = True
optimization_options.enable_embed_layer_norm = True
optimization_options.enable_bias_skip_layer_norm = True
optimization_options.enable_bias_gelu = True
optimization_options.enable_gelu_approximation = False
logger.warning(f">>>>>>> Start optimizing ONNX graph on: {model_type}")
# for magatron GPT2
optimizer = optimize_model(
onnx_model_f,
model_type=model_type,
# num_heads=16,
num_heads=32,
# hidden_size=1024,
hidden_size=2560,
optimization_options=optimization_options,
opt_level=0,
use_gpu=False,
only_onnxruntime=False,
)
As you can see my commented part is the smaller GPT model, which can be successfully quantized. But huge model fails. If you interested try same 7.0GB model quantization that would be very great and I really would like to see if you can managed to convert it.
Hi, for model big as 7GB, does transformers support export to onnx?? Any tutorial about big model?