isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.43k stars 619 forks source link

Unflatten & ONNX export #182

Open carsonswope opened 2 years ago

carsonswope commented 2 years ago

I am able to export the MiDaS model w/ DPT large backbone to ONNX using torch.onnx.export almost without modifying the model at all. Model constructor looks like: model = DPTDepthModel(path='dpt_large-midas-2f21e586.pt', backbone='vitl16_384', non_negative=True)

The one issue I have found is from this line in vit.py:

unflatten = nn.Sequential(
    nn.Unflatten(
        2,
        torch.Size(
            [
                h // pretrained.model.patch_size[1],
                w // pretrained.model.patch_size[0],
            ]
        ),
    )
)

Which gives me this error when attempting to export to ONNX, from flatten.py in pytorch (I'm running torch 1.11.0):

TypeError: unflattened_size must be tuple of ints, but found element of type Tensor at pos 0

We don't want to cast h and w to int, because then the dimensionality will be hard-coded into the model and I would like the onnx model to support dynamic height & width.

So, why not do something like this instead:

unflatten = lambda layer: layer.view((
    b,
    layer.shape[1],
    h // pretrained.model.patch_size[1],
    w // pretrained.model.patch_size[0]
))

This seems to me to be functionally equivalent, if not as elegant because you also have to pass b and the channel count for each layer into the function as well.

aswanthkrishna commented 2 years ago

@carsonswope where you able to export the new models to onnx?

carsonswope commented 2 years ago

@aswanthkrishna yes, with the above modification, I was able to export to onnx

rajanadkat commented 1 year ago

Thank you! That worked !