VisionEncoderDecoderModel ONNX Conversion - TrOCR

RichardRivaldo commented 1 year ago

I want to convert my TrOCR model into TFLite version. To do that, based on my understanding, I need to convert it first to ONNX, then to TF, and lastly to TFLite. I stumbled upon #19604. However, it's a bit different. In my case, I used the trainer.save function to save my finetuned TrOCR model. As a result, I got the checkpoint files and also these files:

config.json
generation_config.json
preprocessor_config.json
pytorch_model.bin
training_args.bin

Command I used:

python -m transformers.onnx --model=trocr/base/ --feature=vision2seq-lm onnx/ --atol 1e-3

Error that I still got:

ValueError: Unrecognized feature extractor in base/. Should have a `feature_extractor_type` key in its preprocessor_config.json of config.json, or one of the following `model_type` keys in its config.json: audio-spectrogram-transformer, beit, chinese_clip, clap, clip, clipseg, conditional_detr, convnext, cvt, data2vec-audio, data2vec-vision, deformable_detr, deit, detr, dinat, donut-swin, dpt, flava, glpn, groupvit, hubert, imagegpt, layoutlmv2, layoutlmv3, levit, maskformer, mctct, mobilenet_v1, mobilenet_v2, mobilevit, nat, owlvit, perceiver, poolformer, regnet, resnet, segformer, sew, sew-d, speech_to_text, speecht5, swin, swinv2, table-transformer, timesformer, tvlt, unispeech, unispeech-sat, van, videomae, vilt, vit, vit_mae, vit_msn, wav2vec2, wav2vec2-conformer, wavlm, whisper, xclip, yolos

In the config.json, I have both trocr and vision-encoder-decoder as the model type, which is not included in the list given by the error. Any other way to do this?

sgugger commented 1 year ago

cc @Rocketknight1 maybe?

RichardRivaldo commented 1 year ago

@sgugger pinging since there's no response

Rocketknight1 commented 1 year ago

It looks like this bug is arising in ONNX export of a PyTorch model, which I don't know too much about!

RichardRivaldo commented 1 year ago

I'm quite confused on this one. Any other workarounds on this? I did read some ways like JIT or using the export function of Torch, but not quite sure on how to do it, especially the input part.

RichardRivaldo commented 1 year ago

Would need help referring this issue to others @Rocketknight1 @sgugger, appreciate it! :D

billyjuliux commented 1 year ago

@sgugger @Rocketknight1 I'm also facing this same issue. Any help would be much appreciated. Thanks!

RichardRivaldo commented 1 year ago

@NielsRogge @michaelbenayoun

michaelbenayoun commented 1 year ago

Hi, Could you try with Optimum?

optimum-cli export onnx  -m trocr/base/ --task vision2seq-lm onnx/ --atol 1e-3

Trying to pinpoint if it comes from the exporting tool or really from some information lacking in the preprocessor_config.json file.

RichardRivaldo commented 1 year ago

Hi @michaelbenayoun, thank you for the response. Yes, I retried using Optimum and it works. I then continued my conversion to TF and TFLite with these commands.

optimum-cli export onnx --model base/ onnx/ --task vision2seq-lm

onnx-tf convert -i onnx/encoder_model.onnx -o encoder/
onnx-tf convert -i onnx/decoder_model.onnx -o decoder/

tflite_convert --saved_model_dir=encoder/ --output_file=encoder.tflite
tflite_convert --saved_model_dir=decoder/ --output_file=decoder.tflite

When I check the encoder input shape to use it for inference, I got the following:

[{'name': 'serving_default_pixel_values:0',
  'index': 0,
  'shape': array([1, 1, 1, 1], dtype=int32),
  'shape_signature': array([-1, -1, -1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]

Any idea on how to fix this? It can't be the correct expected shape right?

michaelbenayoun commented 1 year ago

We support also the export to TFLIte directly in Optimum, but not for TrOCR yet, just letting you know.

About your issue, if I understand correctly you convert the ONNX models to a TensorFlow SavedModels.

Once you have done that, I would suggest convert those SavedModels to TFLite programatically, for each SavedModel try:

Load the SavedModel

Create a tf.function with the proper input signature from it:

func = tf.function(loaded_model, input_signature=[tf.TensorSpec([shape here], dtype=torch.float32)])

Create a concrete function from func:

concrete_func = func.get_concrete_function()

Convert the concrete function to TFLite following this example

Tell me if it works!

RichardRivaldo commented 1 year ago

Wow, thank you for the heads-up @michaelbenayoun, that Optimum feature is surely awaited!

Anyway, I tried your suggestion. Currently:

model = tf.saved_model.load("converted/tf/encoder/")
func = tf.function(model, input_signature=[tf.TensorSpec([1, 384, 384, 3], dtype=tf.float32)])
concrete_func = func.get_concrete_function()

However, I got this error from the concrete function getter:

 ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
      Positional arguments (1 total):
        * <tf.Tensor 'None_0:0' shape=(1, 384, 384, 3) dtype=float32>
      Keyword arguments: {}

     Expected these arguments to match one of the following 1 option(s):

    Option 1:
      Positional arguments (0 total):
        * 
      Keyword arguments: {'pixel_values': TensorSpec(shape=(None, None, None, None), dtype=tf.float32, name='pixel_values')}

From my research I think this is because the shape is incorrect, but I don't know how to reshape the input. Any other suggestion on this? TIA! :D

michaelbenayoun commented 1 year ago

I think it's because it does not recognize the input signature.

Could you try:

func = tf.function(model, input_signature=[tf.TensorSpec([1, 384, 384, 3], dtype=tf.float32, name="pixel_values")])

RichardRivaldo commented 1 year ago

Nope, still got the same error with that.

textyash20 commented 1 year ago

any updates on this, I am also facing this issue @RichardRivaldo @michaelbenayoun

RichardRivaldo commented 1 year ago

no @textyash20 have you found the solution for this?

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

luvwinnie commented 1 year ago

Does anyone found the solution?

vu0607 commented 1 year ago

Open file preprocessor_config.json in pretrained model on HuggingFace, you will see "feature_extractor_type" Example in trocr-small-handwritten "feature_extractor_type": "DeiTFeatureExtractor" Open and paste it into your preprocessor_config.json

huggingface / transformers

VisionEncoderDecoderModel ONNX Conversion - TrOCR #22565