Open MLRadfys opened 2 weeks ago
gen_processing_model
returns two ONNX models, one for pre-processing, and other is for post-processing if post_kwargs presents in the kwargs.
If you want to combine any processing model into the ONNX model, please use this function https://onnx.ai/onnx/api/compose.html#onnx.compose.merge_models
Thank you so much for the quick reply!
I will give it a try! Are there any alternatives? I just looked at the different test functions and saw that one can create a model using
node = [helper.make_node( 'RobertaTokenizer', ['string_input'], ['input_ids'], vocab=_get_file_content(vocab_file), merges=_get_file_content(merges_file), name='bpetok', padding_length=max_length, domain='ai.onnx.contrib')]
graph = helper.make_graph(node, 'test0', [input1], [output1]) tokenizer_model = make_onnx_model(graph)
Would it then be possible to create a pipeline using:
full_model = pnp.SequentialProcessingModule(tokenizer_model, Roberta_model)
Or in other words, which one is the easiest and most straight-forward method? :-)
Really appreciate your help!
Cheers,
M
Thank you so much for the quick reply!
I will give it a try! Are there any alternatives? I just looked at the different test functions and saw that one can create a model using
node = [helper.make_node( 'RobertaTokenizer', ['string_input'], ['input_ids'], vocab=_get_file_content(vocab_file), merges=_get_file_content(merges_file), name='bpetok', padding_length=max_length, domain='ai.onnx.contrib')]
graph = helper.make_graph(node, 'test0', [input1], [output1]) tokenizer_model = make_onnx_model(graph)
Would it then be possible to create a pipeline using:
full_model = pnp.SequentialProcessingModule(tokenizer_model, Roberta_model)
Or in other words, which one is the easiest and most straight-forward method? :-)
Really appreciate your help!
Cheers,
M
this approach works in a lower level which requires onnx and tokenization data knowledge and is prone to errors. So, it is recommended to only use gen_processing_model
API and users can get support if there is any problem.
Alright, I think I solved it using the gen_processing() and merge functions :-)
I attach my solution as a reference for others who encounter a similar problem:
import torch
from onnxruntime_extensions import gen_processing_models
from onnxruntime_extensions import get_library_path
import onnx
import onnxruntime as ort
import numpy as np
from transformers import RobertaForSequenceClassification, RobertaTokenizer
# Step 1: Load the Huggingface Roberta tokenizer and model
input_text = "A test text!"
model_type = "roberta-base"
model = RobertaForSequenceClassification.from_pretrained(model_type)
tokenizer =RobertaTokenizer.from_pretrained(model_type)
# Step 2: Export the tokenizer to ONNX using gen_processing_models
onnx_tokenizer_path = "tokenizer.onnx"
# Generate the tokenizer ONNX model
tokenizer_onnx_model = gen_processing_models(tokenizer, pre_kwargs={})[0]
# Save the tokenizer ONNX model
with open(onnx_tokenizer_path, "wb") as f:
f.write(tokenizer_onnx_model.SerializeToString())
# Step 3: Export the Huggingface Roberta model to ONNX
onnx_model_path = "model.onnx"
dummy_input = tokenizer("This is a dummy input", return_tensors="pt")
# 5. Export the model to ONNX
torch.onnx.export(
model, # model to be exported
(dummy_input['input_ids'],dummy_input["attention_mask"]), # model input (dummy input)
onnx_model_path, # where to save the ONNX model
input_names=["input_ids", "attention_mask_input"], # input tensor name
output_names=["logits"], # output tensor names
dynamic_axes={"input_ids": {0: "batch_size", 1: "sequence_length"}, # dynamic axes
"logits": {0: "batch_size"}
}
)
# Step 4: Merge the tokenizer and model ONNX files into one
onnx_combined_model_path = "combined_model_tokenizer.onnx"
# Load the tokenizer and model ONNX files
tokenizer_onnx_model = onnx.load(onnx_tokenizer_path)
model_onnx_model = onnx.load(onnx_model_path)
# Inspect the ONNX models to find the correct input/output names
print("Tokenizer Model Inputs:", [node.name for node in tokenizer_onnx_model.graph.input])
print("Tokenizer Model Outputs:", [node.name for node in tokenizer_onnx_model.graph.output])
print("Model Inputs:", [node.name for node in model_onnx_model.graph.input])
print("Model Outputs:", [node.name for node in model_onnx_model.graph.output])
# Merge the tokenizer and model ONNX files
combined_model = onnx.compose.merge_models(
tokenizer_onnx_model,
model_onnx_model,
io_map=[('input_ids', 'input_ids'), ('attention_mask', 'attention_mask_input')]
)
# Save the combined model
onnx.save(combined_model, onnx_combined_model_path)
# Step 5: Test the combined ONNX model using an Inference session with ONNX Runtime Extensions
# Initialize ONNX Runtime SessionOptions and load custom ops library
sess_options = ort.SessionOptions()
sess_options.register_custom_ops_library(get_library_path())
# Initialize ONNX Runtime Inference session with Extensions
session = ort.InferenceSession(onnx_combined_model_path, sess_options=sess_options, providers=['CPUExecutionProvider'])
# Prepare dummy input text
input_feed = {"input_text": np.asarray([input_text])} # Assuming "input_text" is the input expected by the tokenizer
# Run the model
outputs = session.run(None, input_feed)
# Print the outputs
print("logits:", outputs[1][0])
Thanks for the help!
Cheers,
M
Hi and thanks for this great library!
Iam very new to onnx and Iam trying to include the Roberta tokenizer into a Roberta onnx model. As far as I have understood, one can get the onnx graph for the tokenizer using:
import onnxruntime as _ort
from transformers import RobertaTokenizer
from onnxruntime_extensions import OrtPyFunction, gen_processing_models
# Roberta tokenizer
tokenizer = AutoTokenizer.from_pretrained("roberta-base", model_max_length=512)
tokenizer_onnx = OrtPyFunction(gen_processing_models(spm_hf_tokenizer, pre_kwargs={})[0])
Now Iam wondering what the next step is? How can I combine the onnx tokenizer (or graph) with a model?
Thanks in advance for any help,
cheers,
M