pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.54k stars 349 forks source link

Should "model" be "trt_model" here? #2932

Closed choosehappy closed 3 months ago

choosehappy commented 3 months ago

Might be a copy paste error:

https://github.com/pytorch/TensorRT/blob/52ba6f1e3ff5b905e15a12b500af2d4abf847e21/examples/dynamo/vgg16_fp8_ptq.py#L239

above has this code:

        trt_model = torchtrt.dynamo.compile(
            exp_program,
            inputs=[input_tensor],
            enabled_precisions={torch.float8_e4m3fn},
            min_block_size=1,
            debug=False,
        )

and then a note saying

        # Inference compiled Torch-TensorRT model over the testing dataset

but then it looks like inference is done with the original model?

            out = model(data)
choosehappy commented 3 months ago

A quick follow up, with v2.4.0a0, it actually doesn't to that line, there is an error above in creating the trt_model:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[17], line 6
      4 input_tensor = images.cuda()
      5 exp_program = torch.export.export(model, (input_tensor,))
----> 6 trt_model = torchtrt.dynamo.compile(
      7     exp_program,
      8     inputs=[input_tensor],
      9     enabled_precisions={torch.float8_e4m3fn},
     10     min_block_size=1,
     11     debug=False,
     12 )
     14 # Inference compiled Torch-TensorRT model over the testing dataset
     15 total = 0

--snip--

TypeError: Provided an unsupported data type as an input data type (support: bool, int32, long, half, float), got: torch.float8_e4m3fn
peri044 commented 3 months ago

@zewenli98 can you take a look ?