[torchlib] Reimplement upsample as trace_only

microsoft / onnxscript

ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.

https://onnxscript.ai/

MIT License

280 stars 53 forks source link

[torchlib] Reimplement upsample as trace_only #1816

Closed justinchuby closed 13 hours ago

justinchuby commented 2 months ago

Reimplement upsample as trace_only and improve accuracy. In particular, check aten.upsample_trilinear3d.vec is accurate.

justinchuby commented 2 months ago

To test:

import torch
import torch_onnx
from monai.networks.nets import SegResNet

torch_onnx.patch_torch(
    report=True, profile=True, verify=True, dump_exported_program=False, fallback=False
)

def main():
    model = SegResNet(
        blocks_down=(1, 2, 2, 4),
        blocks_up=(1, 1, 1),
        init_filters=16,
        in_channels=4,
        out_channels=3,
        dropout_prob=0.2,
    ).eval()
    data = torch.randn(1, 4, 224, 224, 128)

    # Check that the upsample op is not decomposed
    torch.onnx.export(model, (data,), "upsample.onnx")

if __name__ == "__main__":
    main()

titaiwangms commented 2 months ago

Is there an example showing the accuracy is off? I don't find any difference between torch-onnx 1.0.11 and 1.0.12 on SegResNet?

justinchuby commented 2 months ago

0.1.12 should preserve the upsample op. If you look at the accuracy section of the report you should find the two values being different.

justinchuby commented 2 months ago

Accuracy was correct. I was mistaken:

Of the call_function nodes, the counts of operators used are:

aten.convolution.default: 32
aten.group_norm.default: 25
aten.relu.default: 25
aten.add.Tensor: 15
aten.upsample_trilinear3d.vec: 3

ONNX Conversion Information

All operators in the model have registered ONNX decompositions.

Decomposition comparison

Ops exist only in the ExportedProgram before decomposition: ['aten.conv3d.default']

Ops exist only in the ExportedProgram after decomposition: ['aten.convolution.default']

Verification results

convolution_31: abs_diff=1.551151e-03, rel_diff=1.636976e+03

justinchuby commented 2 months ago

@titaiwangms please feel free to decide how you want to handle this issue. I think creating a trace version is still helpful, as we are doing some unnecessary operations on static values.

titaiwangms commented 1 day ago

@justinchuby Sorry for postponing this so long. I just checked whole upsample family, anf they are all trace_only. Is this done?

justinchuby commented 1 day ago

Let me check and update this issue

justinchuby commented 13 hours ago

That's done. Thanks