ORTStableDiffusionPipeline: INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(int64)) , expected: (tensor(int32))

sunhs commented 1 year ago

System Info

platform: Ubuntu 16.04
python: 3.10
optimum: 1.8.2
diffusers: 0.15.1
transformers: 4.28.1

Who can help?

@michaelbenayoun

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[x] My own task or dataset (give details below)

Reproduction

EDIT: Note that this error occurs with pytorch 1.12.1, but doesn't occur with pytorch 1.13 or pytorch 2. See comment

Code:

from optimum.onnxruntime import ORTStableDiffusionPipeline

pipe = ORTStableDiffusionPipeline.from_pretrained("SG161222/Realistic_Vision_V2.0", export=True)

Error:

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
/conda/envs/largemodel/lib/python3.10/site-packages/diffusers/models/cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
  deprecate(
Framework not specified. Using pt to export to ONNX.
Keyword arguments {'subfolder': '', 'trust_remote_code': False} are not expected by StableDiffusionPipeline and will be ignored.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
/conda/envs/largemodel/lib/python3.10/site-packages/diffusers/models/cross_attention.py:51: FutureWarning: CrossAttnProcessor is deprecated and will be removed in `0.18.0`. Please use `from diffusers.models.attention_processor import AttnProcessor instead.
  deprecate("cross_attention", "0.18.0", deprecation_message, standard_warn=False)
Using framework PyTorch: 1.12.1
/conda/envs/largemodel/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:759: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask.fill_(torch.tensor(torch.finfo(dtype).min))
/conda/envs/largemodel/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/conda/envs/largemodel/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
/conda/envs/largemodel/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/conda/envs/largemodel/lib/python3.10/site-packages/torch/onnx/symbolic_opset9.py:4189: UserWarning: Exporting aten::index operator of advanced indexing in opset 14 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn(
Traceback (most recent call last):
  File "/projects/sd/convert/to_onnx.py", line 3, in <module>
    pipe = ORTStableDiffusionPipeline.from_pretrained("SG161222/Realistic_Vision_V2.0", export=True)
  File "/conda/envs/largemodel/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 646, in from_pretrained
    return super().from_pretrained(
  File "/conda/envs/largemodel/lib/python3.10/site-packages/optimum/modeling_base.py", line 362, in from_pretrained
    return from_pretrained_method(
  File "/conda/envs/largemodel/lib/python3.10/site-packages/optimum/onnxruntime/modeling_diffusion.py", line 305, in _from_transformers
    main_export(
  File "/conda/envs/largemodel/lib/python3.10/site-packages/optimum/exporters/onnx/__main__.py", line 295, in main_export
    _, onnx_outputs = export_models(
  File "/conda/envs/largemodel/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 609, in export_models
    export(
  File "/conda/envs/largemodel/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 714, in export
    config.fix_dynamic_axes(output, device=device, input_shapes=input_shapes, dtype=dtype)
  File "/conda/envs/largemodel/lib/python3.10/site-packages/optimum/exporters/onnx/base.py", line 279, in fix_dynamic_axes
    outputs = session.run(None, onnx_inputs)
  File "/conda/envs/largemodel/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(int64)) , expected: (tensor(int32))

Expected behavior

Succeed to convert stable diffusion pipeline to onnx.

I looked a bit into the code, and found the following call stacks might conflict:

Stack 1:

optimum/exporters/onnx/convert.py:699 in export
optimum/exporters/onnx/convert.py:412 in export_pytorch
optimum/exporters/onnx/model_configs.py:667 in CLIPTextOnnxConfig.generate_dummy_inputs

We see dummy_inputs["input_ids"] is torch.int32, so is the input node of the exported onnx model.

Stack 2:

optimum/exporters/onnx/convert.py:716 in export
optimum/exporters/onnx/base.py:267 in OnnxConfig.fix_dynamic_axes
optimum/exporters/onnx/base.py:385 in OnnxConfig.generate_dummy_inputs
optimum/utils/input_generators.py:311 in DummyTextInputGenerator.generate
optimum/utils/input_generators.py:125 in DummyInputGenerator.random_int_tensor

We see dummy_inputs["input_ids"] in OnnxConfig.fix_dynamic_axes is np.int64, which conflicts with the previous one used to export the model.

amitportnoy commented 1 year ago

I think the core issue is that DummyInputGenerator.random_int_tensor only outputs int64, this also makes it hard to override for other purposes. (some models should support either int64 or int32, depending on the tokenizers, if onnx itself doesn't support generic int input, the export function should IMO)

Additionally, convert.py, support overriding input from float32 to float16, but not from int64 to int32.

michaelbenayoun commented 1 year ago

Hi, I am not able to reproduce the error. Could you try two things:

Installing triton, since it says Error caught was: No module named 'triton'
Upgrading your version of diffusers

sunhs commented 1 year ago

@michaelbenayoun Sorry for the late reply. I've upgraded everything to the latest version, including installing triton, which I don't think is related to this problem. Now I have

optimum: 1.8.2
diffusers: 0.16.1
transformers: 4.28.1
onnx: 1.14.0
onnxruntime: 1.14.1
triton: 2.0.0.post1

but still the same error.

Did those 2 stacks I suspected conflict do any help?

regisss commented 1 year ago

@sunhs Could you try again with the latest version of Optimum please? You can install it with:

pip install --upgrade optimum

sunhs commented 1 year ago

@regisss Hi, I've already upgraded to 1.8.5 with pip install --upgrade 'optimum[onnxruntime]' but the error still remains. IMO, as far as the two lines of code I mentioned above remain unchanged, this error will not go away (If that's not something onnx/onnxruntime should take care of)

regisss commented 1 year ago

@sunhs I managed to reproduce it. Would you like to submit a PR to fix this with what you're suggesting?

sunhs commented 1 year ago

@regisss Sure, I'll take time to try.

sunhs commented 1 year ago

Sorry for being so late here. As a reference I need to update the problem description.

After some digging, I found that CLIPTextOnnxConfig.generate_dummy_inputs indeed calls DummyTextInputGenerator.generate. But after that, it cast the input to torch.int32, which makes the difference. I don't know whether this casting is necessary. Maybe @echarlaix could give some clues?

EDIT: Additionally, this error doesn't occur in pytorch 1.13+. This is because pooler_output of the clip onnx model is

name: "pooler_output"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "batch_size"
      }
      dim {
        dim_value: 768
      }
    }
  }
}

while for pytorch1.12.1 it's

name: "pooler_output"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "batch_size"
      }
      dim {
        dim_param: "Reshapepooler_output_dim_1"
      }
    }
  }
}

where Reshapepooler_output_dim_1 is not an allowed_dynamic_axes and thus triggers the fix.

fxmarty commented 1 year ago

Edit: oh, this torch 1.xx specific. I actually can't reproduce on 1.13.1 either. Is this issue about the cast operation introduced, or is there an error raised?

I have no issue running

from optimum.onnxruntime import ORTStableDiffusionPipeline

pipe = ORTStableDiffusionPipeline.from_pretrained("SG161222/Realistic_Vision_V2.0", export=True)

on

onnx                         1.14.0
onnxruntime                  1.14.1
optimum                      1.8.5
transformers                 4.28.1
torch                        2.0.0
diffusers                    0.16.1

sunhs commented 1 year ago

I just found it reproduces for pytorch 1.12 but not for either 1.13 or 2.x. I've updated the description (sorry for not having done it earlier).

huggingface / optimum