isl-org / DPT

Dense Prediction Transformers
MIT License
2.01k stars 258 forks source link

Can dpt models be traced? #42

Open 3togo opened 3 years ago

3togo commented 3 years ago

I try to trace "dpt_hybrid_midas" by calling

torch.jit.trace(model, example_input)

However, it failed with error messages below. Any pointer on how to do it properly?

/usr/local/lib/python3.9/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /mnt/data/git/DPT/dpt/vit.py:154: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. gs_old = int(math.sqrt(len(posemb_grid))) /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:3609: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn( Traceback (most recent call last): File "/mnt/data/git/DPT/export_model.py", line 112, in convert(in_model_path, out_model_path) File "/mnt/data/git/DPT/export_model.py", line 64, in convert sm = torch.jit.trace(model, example_input) File "/usr/local/lib/python3.9/dist-packages/torch/jit/_trace.py", line 735, in trace return trace_module( File "/usr/local/lib/python3.9/dist-packages/torch/jit/_trace.py", line 952, in trace_module module._c._create_method_from_trace( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1039, in _slow_forward result = self.forward(input, **kwargs) File "/mnt/data/git/DPT/dpt/models.py", line 115, in forward inv_depth = super().forward(x).squeeze(dim=1) File "/mnt/data/git/DPT/dpt/models.py", line 72, in forward layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x) File "/mnt/data/git/DPT/dpt/vit.py", line 120, in forward_vit nn.Unflatten( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/flatten.py", line 102, in init self._require_tuple_int(unflattened_size) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/flatten.py", line 125, in _require_tuple_int raise TypeError("unflattened_size must be tuple of ints, " + TypeError: unflattened_size must be tuple of ints, but found element of type Tensor at pos 0

ranftlr commented 3 years ago

The current model isn't traceable unfortunately. As this is a rather popular request (see also https://github.com/isl-org/MiDaS/issues/122) we are working on a rewrite to fix this.

3togo commented 3 years ago

ranftlr,

many thanks for your prompt reply.

eli

ranftlr commented 3 years ago

I just pushed a preview of a scriptable and traceable model to branch "dpt_scriptable": https://github.com/isl-org/DPT/tree/dpt_scriptable. Note that you have to download updated weight files for this to work. You can find updated links in the README of the branch.

Please let us know if this solves your problem or if you experience any issues with this.

phamdat09 commented 3 years ago

@ranftlr Thanks for your works. This code does not work with torch.onnx. Can you see it ? Thanks

3togo commented 3 years ago

@ranftlr , I try to trace your "dpt_hybrid-midas-d889a10e.pt" using torch.jit.trace but failed

Below is the error message: File "/usr/local/lib/python3.9/dist-packages/torch/_tensor.py", line 867, in unflatten return super(Tensor, self).unflatten(dim, sizes, names) RuntimeError: NYI: Named tensors are not supported with the tracer

errors.txt

AbdouSarr commented 3 years ago

is there a fix for this yet @ranftlr ? thank you

Wing100 commented 3 years ago

I try to trace "dpt_hybrid_midas" by calling

torch.jit.trace(model, example_input)

However, it failed with error messages below. Any pointer on how to do it properly?

/usr/local/lib/python3.9/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /mnt/data/git/DPT/dpt/vit.py:154: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. gs_old = int(math.sqrt(len(posemb_grid))) /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:3609: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn( Traceback (most recent call last): File "/mnt/data/git/DPT/export_model.py", line 112, in convert(in_model_path, out_model_path) File "/mnt/data/git/DPT/export_model.py", line 64, in convert sm = torch.jit.trace(model, example_input) File "/usr/local/lib/python3.9/dist-packages/torch/jit/_trace.py", line 735, in trace return trace_module( File "/usr/local/lib/python3.9/dist-packages/torch/jit/_trace.py", line 952, in trace_module module._c._create_method_from_trace( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1039, in _slow_forward result = self.forward(input, kwargs) File "/mnt/data/git/DPT/dpt/models.py", line 115, in forward inv_depth = super().forward(x).squeeze(dim=1) File "/mnt/data/git/DPT/dpt/models.py", line 72, in forward layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x) File "/mnt/data/git/DPT/dpt/vit.py", line 120, in forward_vit nn.Unflatten( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/flatten.py", line 102, in init** self._require_tuple_int(unflattened_size) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/flatten.py", line 125, in _require_tuple_int raise TypeError("unflattened_size must be tuple of ints, " + TypeError: unflattened_size must be tuple of ints, but found element of type Tensor at pos 0

I try to trace "dpt_hybrid_midas" by calling

torch.jit.trace(model, example_input)

However, it failed with error messages below. Any pointer on how to do it properly?

/usr/local/lib/python3.9/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /mnt/data/git/DPT/dpt/vit.py:154: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. gs_old = int(math.sqrt(len(posemb_grid))) /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:3609: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn( Traceback (most recent call last): File "/mnt/data/git/DPT/export_model.py", line 112, in convert(in_model_path, out_model_path) File "/mnt/data/git/DPT/export_model.py", line 64, in convert sm = torch.jit.trace(model, example_input) File "/usr/local/lib/python3.9/dist-packages/torch/jit/_trace.py", line 735, in trace return trace_module( File "/usr/local/lib/python3.9/dist-packages/torch/jit/_trace.py", line 952, in trace_module module._c._create_method_from_trace( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1039, in _slow_forward result = self.forward(input, kwargs) File "/mnt/data/git/DPT/dpt/models.py", line 115, in forward inv_depth = super().forward(x).squeeze(dim=1) File "/mnt/data/git/DPT/dpt/models.py", line 72, in forward layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x) File "/mnt/data/git/DPT/dpt/vit.py", line 120, in forward_vit nn.Unflatten( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/flatten.py", line 102, in init** self._require_tuple_int(unflattened_size) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/flatten.py", line 125, in _require_tuple_int raise TypeError("unflattened_size must be tuple of ints, " + TypeError: unflattened_size must be tuple of ints, but found element of type Tensor at pos 0

Hello,I also encountered the same problem. Has this problem been solved, please?

guillesanbri commented 3 years ago

Hi, I have been trying to export DPT-Hybrid to onnx today using the dpt_scriptable branch and also encountered RuntimeError: NYI: Named tensors are not supported with the tracer. I found this pytorch issue which looks the same. The problem is the usage of unflatten. I have succesfully exported the onnx model removing these two unflatten calls (vit.py, lines ~320)

layer_3 = self.act_postprocess3(layer_3.unflatten(2, out_size))
layer_4 = self.act_postprocess4(layer_4.unflatten(2, out_size))

and using view instead

x3, y3, z3 = layer_3.shape
layer_3 = self.act_postprocess3(layer_3.view(x3, y3, *out_size))
x4, y4, z4 = layer_4.shape
layer_4 = self.act_postprocess4(layer_4.view(x4, y4, *out_size))

The hybrid model doesn't need to convert layer1 and layer2, but the same solution probably applies.

I will test further and comment back soon.

Wing100 commented 3 years ago

@guillesanbri If there is a pre-trained model trained by others in the network, such as resnet50, can dpt models be traced?

RuntimeError: Error(s) in loading state_dict for net: Missing key(s) in state_dict: Unexpected key(s) in state_dict:

guillesanbri commented 3 years ago

@Wing100 I'm not sure what are you referring to, I traced the Hybrid model which has a ResNet50 inside. The error you got seems to be related to loading model parameters from another model without setting strict=False, but afaik that is not related to tracing the model.

romil611 commented 2 years ago

@guillesanbri Hi, after making changing from unflatten to view, i get the following error:

RuntimeError: Unsupported: ONNX export of transpose for tensor of unknown rank.

did you encounter this and/or do you know how to solve this? thanks in advance!

guillesanbri commented 2 years ago

@romil611 I think I got that error when playing with the dynamic axes of onnx export. My use case doesn't need dynamic axes so I have set their size static for now. Will ping you if I get back to that.

romil611 commented 2 years ago

@guillesanbri I also need static sizes and didn't add the dynamic axes option in the torch.onnx.export call. My guess was that dynamic axis were being used somewhere inside which his causing the issue. If you remember anything related to it then do tell. Anyways, Thanks for the reply!!

ghost commented 2 years ago

@guillesanbri Hi, after making changing from unflatten to view, i get the following error:

RuntimeError: Unsupported: ONNX export of transpose for tensor of unknown rank.

did you encounter this and/or do you know how to solve this? thanks in advance!

@romil611 I saw this error when I was calling torch.onnx.export on the scripted version of the model. Make sure you don't have

model = torch.jit.script(model)

anywhere preceding your export call.

For me, the other secret for a successful export (in addition to the edits @guillesanbri has already suggested) was to keep everything on the CPU. According to this comment, the device the model was running on when exported does not affect the resultant onnx model.

romil611 commented 2 years ago

For me the torch.export worked with the main brach itself when I tried to change unflatten to view.

3togo commented 2 years ago

Thank you for efforts due to so many peoples. The problem is fixed by using the latest version of pytorch.

jucic commented 2 years ago

Hi, I have been trying to export DPT-Hybrid to onnx today using the dpt_scriptable branch and also encountered RuntimeError: NYI: Named tensors are not supported with the tracer. I found this pytorch issue which looks the same. The problem is the usage of unflatten. I have succesfully exported the onnx model removing these two unflatten calls (vit.py, lines ~320)

layer_3 = self.act_postprocess3(layer_3.unflatten(2, out_size))
layer_4 = self.act_postprocess4(layer_4.unflatten(2, out_size))

and using view instead

x3, y3, z3 = layer_3.shape
layer_3 = self.act_postprocess3(layer_3.view(x3, y3, *out_size))
x4, y4, z4 = layer_4.shape
layer_4 = self.act_postprocess4(layer_4.view(x4, y4, *out_size))

The hybrid model doesn't need to convert layer1 and layer2, but the same solution probably applies.

I will test further and comment back soon.

I tried to export DPT-Hybrid to onnx today using the dpt_scriptable branch, however encountered with the following issue: image do you know why? it seems a bug in the model returned by timm.create_model("vit_base_resnet50_384", pretrained=pretrained) I tried change x = self.model.patch_embed.backbone(x) to x = self.model.patch_embed.backbone(x.contiguous()) however it doesn't work, do you know what's the problem? thanks ahead!

I solved the above problem by downgrade timm. but I encounterd with another problem: Exporting the operator std_mean to ONNX opset version 12 is not supported. Please open a bug to request ONNX export support for the missing operator. anyone knows how to solve it?

Tord-Zhang commented 2 years ago

@guillesanbri @ranftlr it seems that the converted onnx model can only support input with static size? The patch size cannot be changed if the model is converted to onnx

3togo commented 1 year ago

I got the following errors when I try to trace "dpt_beit_large_384.pt".

Any help?

Traceback (most recent call last):
  File "/work/gitee/MiDaS-cpp/python/export_model.py", line 162, in <module>
    convert(in_model_type, in_model_path, out_model_path)
  File "/work/gitee/MiDaS-cpp/python/export_model.py", line 84, in convert
    sm = torch.jit.trace(model, sample, strict=False)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eli/.local/lib/python3.11/site-packages/torch/jit/_trace.py", line 794, in trace
    return trace_module(
           ^^^^^^^^^^^^^
  File "/home/eli/.local/lib/python3.11/site-packages/torch/jit/_trace.py", line 1084, in trace_module
    _check_trace(
  File "/home/eli/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/eli/.local/lib/python3.11/site-packages/torch/jit/_trace.py", line 562, in _check_trace
    raise TracingCheckError(*diag_info)
torch.jit._trace.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
    Graph diff:
          graph(%self.1 : __torch__.midas.dpt_depth.DPTDepthModel,
                %x.1 : Tensor):
            %scratch : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="scratch"](%self.1)
            %output_conv : __torch__.torch.nn.modules.container.Sequential = prim::GetAttr[name="output_conv"](%scratch)
            %scratch.15 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="scratch"](%self.1)
            %refinenet1 : __torch__.midas.blocks.FeatureFusionBlock_custom = prim::GetAttr[name="refinenet1"](%scratch.15)
            %scratch.13 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="scratch"](%self.1)
            %refinenet2 : __torch__.midas.blocks.FeatureFusionBlock_custom = prim::GetAttr[name="refinenet2"](%scratch.13)
            %scratch.11 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="scratch"](%self.1)
            %refinenet3 : __torch__.midas.blocks.FeatureFusionBlock_custom = prim::GetAttr[name="refinenet3"](%scratch.11)
            %scratch.9 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="scratch"](%self.1)
            %refinenet4 : __torch__.midas.blocks.FeatureFusionBlock_custom = prim::GetAttr[name="refinenet4"](%scratch.9)
            %scratch.7 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="scratch"](%self.1)
            %layer4_rn : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="layer4_rn"](%scratch.7)
            %scratch.5 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="scratch"](%self.1)
            %layer3_rn : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="layer3_rn"](%scratch.5)
            %scratch.3 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="scratch"](%self.1)
            %layer2_rn : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="layer2_rn"](%scratch.3)
            %scratch.1 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="scratch"](%self.1)
            %layer1_rn : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="layer1_rn"](%scratch.1)
            %pretrained : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="pretrained"](%self.1)
            %act_postprocess4 : __torch__.torch.nn.modules.container.Sequential = prim::GetAttr[name="act_postprocess4"](%pretrained)
            %_4.7 : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="4"](%act_postprocess4)
            %pretrained.83 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="pretrained"](%self.1)
            %act_postprocess4.5 : __torch__.torch.nn.modules.container.Sequential = prim::GetAttr[name="act_postprocess4"](%pretrained.83)
            %_3.9 : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="3"](%act_postprocess4.5)
            %pretrained.81 : __torch__.torch.nn.modules.module.Module = prim::GetAttr[name="pretrained"](%self.1)
            %act_postprocess3 : __torch__.torch.nn.modules.container.Sequential = prim::GetAttr[name="act_postprocess3"](%pretrained.81)
foemre commented 1 year ago

https://github.com/isl-org/MiDaS/issues/189 I can verify that dpt_large_384.pt in MiDaS v3.1 can be traced using torch.jit.trace, but I cannot export the model to ONNX. I'm receiving RuntimeError: Input type (float) and bias type (c10::Half) should be the same. Has anyone had any experience exporting the latest models to ONNX?