Closed jafri closed 1 year ago
Hello @jafri ,
Yes - this is something I have been looking into over the last few weeks. Unfortunately the ONNX ecosystem for Rust is not yet at the same level of Python. I have mainly investigated 2 implementations so far:
t5-small
model and will raise a related issue with the authors of the library.This is indeed a feature that would be interesting to implement, but I am still looking for an ONNX runtime that could reliably run models exported via optimum
.
@guillaume-be here is a working example on philschmid/distilbart-cnn-12-6-samsum
As you found t5
doesn't work for now due to TDim
used in the original tensorflow model
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer
model_checkpoint = "philschmid/distilbart-cnn-12-6-samsum"
save_directory = "tmp/onnx/"
# Export to onnx
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
ort_model = ORTModelForSeq2SeqLM.from_pretrained(model_checkpoint, from_transformers=True)
# Save the onnx model and tokenizer
ort_model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)
use ndarray::s;
use std::{
path::{Path, PathBuf},
str::FromStr,
};
use tokenizers::tokenizer::{Result, Tokenizer};
use tract_onnx::prelude::*;
pub fn onnx2() -> Result<()> {
let model_dir = PathBuf::from_str("tmp/onnx")?;
let encoder_path = Path::join(&model_dir, "encoder_model.onnx");
let decoder_path = Path::join(&model_dir, "decoder_model.onnx");
let decoder_with_past_path = Path::join(&model_dir, "decoder_with_past_model.onnx");
let encoder_model = onnx().model_for_path(encoder_path)?.into_runnable()?;
let input_ids: Vec<i64> = vec![8774, 48, 19, 3, 9, 182, 307, 1499, 12, 36, 15459, 5, 1];
let attention: Vec<i64> = vec![1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1];
let input_ids =
tract_ndarray::Array2::from_shape_vec((1, input_ids.len()), input_ids.clone())?.into();
let attention_mask =
tract_ndarray::Array2::from_shape_vec((1, attention.len()), attention)?.into();
let model_inputs = tvec!(input_ids, attention_mask);
let result = encoder_model.run(model_inputs)?;
println!("{:?}", result);
Ok(())
}
For information I have raised https://github.com/sonos/tract/issues/856 to try and solve the issue with the T5 model. The support for ONNX model would require significant library rework and important library design choices. The design of the pipelines, their configuration, and especially the text generation pipelines will be impacted.
I understand BART/DistilBART models work as you illustrated. I would like to get a better understanding about the range of models exported from Transformers that can be supported by Tract since they do not rely on the same ONNX backend before committing to significant design decisions.
As mentioned this is a prioritized feature on my side, I want to make sure the complexity of the integration is handled right.
Worth exploring adding an option to use onnx instead of libtorch to run models
The following python code runs 2x faster than rust-bert summarization on my M1 Mac
Directly compared to https://github.com/guillaume-be/rust-bert/blob/master/examples/summarization_t5.rs