jesusSant commented 1 year ago

First of all, thank you very much for making our lives easier with the work you do at huggingface, congratulations! We have a model based on the encoder-decoder architecture, made up of 2 RoBERTa. The model works quite well, but unfortunately its inference time is quite high (about 400ms to generate a sentence of about 7 tokens). We would like to reduce that time, and have opted for ONNX and Optimum. We have managed to export the model to ONNX, generating an encoder_model.onnx, a decoder_model.onnx, and a decoder_with_past_model.onnx. We can load this exported model using ORTModelForSeq2SeqLM.from_pretrained(·), and therefore, it allows us to use the generate(·) method. The problem is that the model in ONNX format is 2 times slower than the model without exporting. We have seen some similar issues: #365, #362. We believe that the exported model is simply in another format, and therefore does not necessarily have to be faster than the base model. Because of all this, our last step has been to use ORTOptimizer.from_pretrained(·) to apply graph optimization (operator fusion and constant folding) to speed up latency and inference. Unfortunately, we have not achieved the latter. The ORTOptimizer.from_pretrained(·) method expects, in addition to the ONNX model, a config.json file. We do not have this file, since it is not generated when exporting the model to ONNX. We have made several attempts to generate and save configuration files, even with the base model's config.json, but not even crossing our fingers we have succeeded 😢 We would appreciate if someone (@lewtun?) could point us to some lines so we can continue... Thank you very much. Best, Jesus.

LysandreJik commented 1 year ago

Hey, thanks for opening an issue Jesus!

Let me move it to Optimum where I'm sure folks will know how to help you out.

shaileshj2803 commented 1 year ago

Hi @jesusSant could you please the script or code used to generate the onnx model for Roberta encoder decoder please?

burakaytan commented 1 year ago

Hello, When I try roberta2roberta encoderdecoder model using ORTModelForSeq2SeqLM, I get an error like "encoder-decoder is not supported yet". How can I resolve this error?

regisss commented 1 year ago

Hi @burakaytan, could you share a code snippet to reproduce this error please?

burakaytan commented 1 year ago

Hi @regisss , While the part EncoderDecoderModel line is working properly, but ORTModelForSeq2SeqLM part gives the error. This is a local model from local path

roberta_shared = EncoderDecoderModel.from_pretrained('Model/557500')

roberta_shared = ORTModelForSeq2SeqLM.from_pretrained('Model/557500', from_transformers=True)

regisss commented 1 year ago

@burakaytan Could you share the complete error message here? It is likely that your custom model doesn't correspond to a model type that is supported in Optimum for ONNX export. You may need to convert it using a custom ONNX config.

burakaytan commented 1 year ago

@regisss I've uploaded the model to the huggingface environment so you can test it, and I'm sharing the code snippet below. If you activate the EncoderDecoderModel part, you can see it working, the ORTModelForSeq2SeqLM part gives an error, I am sharing the error below.

from transformers import RobertaTokenizerFast
from transformers import EncoderDecoderModel
from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = RobertaTokenizerFast.from_pretrained('burakaytan/encoder_decoder_test',  max_len=128)
tokenizer.bos_token = tokenizer.cls_token
tokenizer.eos_token = tokenizer.sep_token

#roberta_shared = EncoderDecoderModel.from_pretrained('burakaytan/encoder_decoder_test')
roberta_shared = ORTModelForSeq2SeqLM.from_pretrained('burakaytan/encoder_decoder_test', from_transformers=True)

def generate_text(text,num_return=3):
    inputs = tokenizer([text], padding="max_length", truncation=True, max_length=128, return_tensors="pt")
    input_ids = inputs.input_ids#.to("cuda")
    attention_mask = inputs.attention_mask#.to("cuda")

    outputs = roberta_shared.generate(input_ids, attention_mask=attention_mask,
                                  num_beams=3,
                                  repetition_penalty=3.0, 
                                  length_penalty=2.0, 
                                  return_dict_in_generate=True,
                                  output_scores=True,
                                  num_return_sequences = num_return,
                                  pad_token_id=2
    )

    outputs = outputs.get('sequences')
    output_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    return output_str
print(generate_text('nlp'))

The error: KeyError: "encoder-decoder is not supported yet. Only {'mobilenet-v2', 'marian', 'unispeech', 'squeezebert', 'roberta', 'groupvit', 'donut-swin', 'mpnet', 'deit', 'convbert', 'wav2vec2-conformer', 'opt', 'levit', 'wavlm', 'pegasus', 'ibert', 'bart', 'data2vec-vision', 'gpt2', 'm2m-100', 'sew-d', 'roformer', 'imagegpt', 'splinter', 'bert', 'speech-to-text', 'convnext', 'lilt', 'mobilebert', 'llama', 't5', 'xlm', 'gptj', 'sew', 'mt5', 'poolformer', 'pix2struct', 'regnet', 'hubert', 'owlvit', 'resnet', 'blenderbot', 'yolos', 'perceiver', 'swin', 'whisper', 'bloom', 'data2vec-text', 'unispeech-sat', 'mobilevit', 'clip', 'longt5', 'deberta', 'audio-spectrogram-transformer', 'vit', 'distilbert', 'nystromformer', 'gpt-neox', 'wav2vec2', 'gpt-neo', 'sam', 'mobilenet-v1', 'beit', 'mbart', 'vision-encoder-decoder', 'electra', 'segformer', 'layoutlmv3', 'data2vec-audio', 'layoutlm', 'blenderbot-small', 'flaubert', 'cvt', 'camembert', 'detr', 'codegen', 'albert', 'xlm-roberta', 'deberta-v2'} are supported. If you want to support encoder-decoder please propose a PR or open up an issue."

regisss commented 1 year ago

@burakaytan EncoderDecoderModel returns a model of type encoder-decoder with many potential encoders and decoders. You could try it with a custom ONNX config as presented here: https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#customize-the-export-of-official-transformers-models

burakaytan commented 1 year ago

@regisss Although encoder decoder models may contain many variations. The EncoderDecoderModel class can decode all of these, doesn't that mean it's converting related models to a standard format? Can't it be done onnx format of ORTModelForSeq2SeqLM class with the same logic?

I looked at the link you shared, but in order to convert a roberta2roberta model to that format, I need to read and understand the onxx infrastructure and apply it to the end. It doesn't have an easily available method like ORTModelForSeq2SeqLM.

will be there a quick solution that can be applied to onxx for the EncoderDecoderModel in the future, I wonder how can I convert the roberta2roberta model to onnx format in the fastest way?

Many thanks for your quick replies and directions.

huggingface / optimum

Optimize ONNX model based on encoder-decoder #396

roberta_shared = EncoderDecoderModel.from_pretrained('Model/557500')