pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.59k stars 350 forks source link

SDXL Accuracy Investigation #2531

Open gs-olive opened 11 months ago

gs-olive commented 10 months ago

Analysis Findings

The above was then narrowed to the following simple matrix-multiply, which when run in FP16 with the dimensions (8192 x 640), (640 x 640), as is used in our SDXL configuration, produces a maximum difference of 10 between two elements in the output of TRT vs that of Torch. The mean difference was also high, at around 0.5.

    class TestModule(torch.nn.Module):
        def forward(self, q, k):
            return (q@k)

Additionally, the native_layer_norm operator may be contributing to the error, since its exclusion brings improved accuracy as well. This is also under investigation.

gs-olive commented 10 months ago

Update

We have further narrowed the matmul cases for easier example-reproducing

Next Steps