isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.43k stars 619 forks source link

torch.jit.trace failing on DPT_hybrid #189

Closed nrakltx closed 1 year ago

nrakltx commented 1 year ago

Hello, I am trying to export MiDaS 3.0 hybrid to ONNX and first to trace it via torch.jit.trace. The following snippet fails

import torch
midas = torch.hub.load("intel-isl/MiDaS", "DPT_hybrid").eval()
model: torch.nn.Module = midas.to(device="cuda").eval() 
torch.jit.trace(model, torch.rand(1, 3, 384, 384, device=device), strict=False).save("midas_hybrid.pt") 

With traceback

Traceback (most recent call last):
  File "infer.py", line 228, in <module>
    main(args.trace, args.export, args.half, args.batch, args.verbose)
  File "infer.py", line 155, in main
    torch.jit.trace(model, torch.rand(1, 3, 384, 384, device=device), strict=False).save("midas_hybrid.pt")  # type: ignore
  File "/home/nrak/miniconda3/envs/midas/lib/python3.8/site-packages/torch/jit/_trace.py", line 759, in trace
    return trace_module(
  File "/home/nrak/miniconda3/envs/midas/lib/python3.8/site-packages/torch/jit/_trace.py", line 976, in trace_module
    module._c._create_method_from_trace(
  File "/home/nrak/miniconda3/envs/midas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/nrak/miniconda3/envs/midas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1178, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/nrak/midas_optimization/MiDaS/midas/dpt_depth.py", line 108, in forward
    return super().forward(x).squeeze(dim=1)
  File "/home/nrak/midas_optimization/MiDaS/midas/dpt_depth.py", line 71, in forward
    layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x)
  File "/home/nrak/midas_optimization/MiDaS/midas/vit.py", line 72, in forward_vit
    nn.Unflatten(
  File "/home/nrak/miniconda3/envs/midas/lib/python3.8/site-packages/torch/nn/modules/flatten.py", line 110, in __init__
    self._require_tuple_int(unflattened_size)
  File "/home/nrak/miniconda3/envs/midas/lib/python3.8/site-packages/torch/nn/modules/flatten.py", line 133, in _require_tuple_int
    raise TypeError("unflattened_size must be tuple of ints, " +
TypeError: unflattened_size must be tuple of ints, but found element of type Tensor at pos 0

Before this error the following warnings appear, which might or might not be related

/home/nrak/midas_optimization/MiDaS/midas/vit.py:106: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  gs_old = int(math.sqrt(len(posemb_grid)))
/home/nrak/miniconda3/envs/midas/lib/python3.8/site-packages/timm/models/layers/padding.py:19: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return max((math.ceil(x / s) - 1) * s + (k - 1) * d + 1 - x, 0)
/home/nrak/miniconda3/envs/midas/lib/python3.8/site-packages/timm/models/layers/padding.py:19: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return max((math.ceil(x / s) - 1) * s + (k - 1) * d + 1 - x, 0)
/home/nrak/miniconda3/envs/midas/lib/python3.8/site-packages/timm/models/layers/padding.py:31: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_h > 0 or pad_w > 0:

There are some sparse mentions of people managing to convert MiDaS to ONNX but I cannot understand how they managed to do so (most of them do not supply actual code for converting). Furthermore, there seems to be some variance as to which model is exported online -DPTDepthModel or MidasNet - what is the difference between them?

I'd very much appreciate some help with this.

Thanks in advance,

N

nrakltx commented 1 year ago

This was solved by adding int() in L76-77 of midas/vit.py.

shanek16 commented 1 year ago

Just for other users, I solved this issue like:

    # unflatten = nn.Sequential(
    #     nn.Unflatten(
    #         2,
    #         torch.Size(
    #             [
    #                 h // pretrained.model.patch_size[1],
    #                 w // pretrained.model.patch_size[0],
    #             ]
    #         ),
    #     )
    # )

    if layer_1.ndim == 3:
        # layer_1 = unflatten(layer_1)
        layer_1 = torch.unflatten(layer_1, 2, torch.Size([h // pretrained.model.patch_size[1], w // pretrained.model.patch_size[0]]))
    if layer_2.ndim == 3:
        # layer_2 = unflatten(layer_2)
        layer_2 = torch.unflatten(layer_2, 2, torch.Size([h // pretrained.model.patch_size[1], w // pretrained.model.patch_size[0]]))
    if layer_3.ndim == 3:
        # layer_3 = unflatten(layer_3)
        layer_3 = torch.unflatten(layer_3, 2, torch.Size([h // pretrained.model.patch_size[1], w // pretrained.model.patch_size[0]]))
    if layer_4.ndim == 3:
        # layer_4 = unflatten(layer_4)
        layer_4 = torch.unflatten(layer_4, 2, torch.Size([h // pretrained.model.patch_size[1], w // pretrained.model.patch_size[0]]))