huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.3k stars 399 forks source link

ONNX support for OneFormer Semantic Segmentation Task #1923

Open EricLe-dev opened 4 days ago

EricLe-dev commented 4 days ago

Feature request

I'm trying to export a pretrained OneFormer to ONNX. I know that optimum has not yet officially supported exporting OneFormer to ONNX. That was why I wrote my own export script. Here it is:

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation 
from PIL import Image
import requests
import torch

processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large")

inputs = processor(Image.open('./sample.jpg'), ["semantic"], return_tensors="pt")

torch.onnx.export(
    model,
    [inputs],
    "export.onnx",
    export_params=True,
    opset_version=16,
    do_constant_folding=True,
    input_names = ['pixel_values'],
    output_names = ['output'],
    dynamic_axes={'input' : {0 : 'batch_size'},
                  'output' : {0 : 'batch_size'}
                 }
)

This gave me the error:

File /media/research/venv_3.10_2/lib/python3.10/site-packages/torch/nn/modules/module.py:1508, in Module._slow_forward(self, *input, **kwargs)
   1506         recording_scopes = False
   1507 try:
-> 1508     result = self.forward(*input, **kwargs)
   1509 finally:
   1510     if recording_scopes:

TypeError: OneFormerForUniversalSegmentation.forward() missing 1 required positional argument: 'task_inputs'

Motivation

If OneFormer can be exported to ONNX, it can be used in TensorRT. Which would boost the speed greatly while, at the same time, reducing the amount of computing resources.

Your contribution

I came across this one. However, with that approach, the exported ONNX model would not work on GPU. Since, the implementation by SHILAB utilized MultiScaleDeformableAttention, which I believe would not work in torch.trace

I don't think Transformer is using MultiScaleDeformableAttention. Yet still, I'm quite new to this and I would really appreciate if someone can point me to the right direction.

Thank you so much!

EricLe-dev commented 4 days ago

After a little bit more digging into the matter, I added this to the forward function in OneFormerForUniversalSegmentation

if task_inputs is None:
     task_inputs = torch.tensor([[49406, 518, 10549, 533, 29119, 1550, 49407, 0, 0, 0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0]])

The issue regarding TypeError: OneFormerForUniversalSegmentation.forward() missing 1 required positional argument: 'task_inputs' was solved.

However, now there is an another issue:

File /media/research/venv_3.10_2/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py:2990, in OneFormerModel.forward(self, pixel_values, task_inputs, text_inputs, pixel_mask, output_hidden_states, output_attentions, return_dict)
   2985 output_hidden_states = (
   2986     output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
   2987 )
   2988 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 2990 batch_size, _, height, width = pixel_values.shape
   2991 print("pixel_values shape and type:", pixel_values.shape, type(pixel_values))
   2993 if pixel_mask is None:

AttributeError: 'list' object has no attribute 'shape'

Can someone point me to the right direction? Thank you!

tjbanks commented 2 days ago

Agree it would be nice to have oneformer supported by optimum. But to solve your issue @EricLe-dev, you might try the following, should export. Didn't test the end result.

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation 
from PIL import Image
import requests
import torch

processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large")

inputs = processor(Image.open('./sample.jpg'), ["semantic"], return_tensors="pt")

torch.onnx.export(
    model,
    dict(inputs),
    "export.onnx",
    export_params=True,
    opset_version=16,
    do_constant_folding=True,
    input_names = ['pixel_values', 'task_inputs'],
    output_names = ['output'],
    dynamic_axes={'input' : {0 : 'batch_size'},
                  'output' : {0 : 'batch_size'}
                 }
)