lix19937 / tensorrt-insight

Deep insight tensorrt, including but not limited to qat, ptq, plugin, triton_inference, cuda
12 stars 0 forks source link

nonzero layer test #8

Open lix19937 opened 6 months ago

lix19937 commented 6 months ago

tensor mask op,as example:x = x[x.sum(dim=-1) > 0]
torch.where(condition), as example: torch.where(x>0)
torch.nonzero(x)

https://docs.nvidia.com/deeplearning/tensorrt/sample-support-guide/index.html#sampleNonZeroPlugin

lix19937 commented 6 months ago

对于 输入为fixed shape,而内部出现 nonzero 操作的一般认为是局部dynamic, currently skipped for dynamic shapes

个人理解:
局部区域按dynamic shape进行优化,而其他fixed shape区域进行常规优化,待验证

cudagraph 的支持可能需要修改init 逻辑

lix19937 commented 4 months ago

look into the model to check if the NonZero can be replaced, sometimes onnx exporter just try to generate some indices to embedding layer, it will also use NoneZero, in this case it can be replaced with Range

https://github.com/NVIDIA/TensorRT/issues/527#issuecomment-789359802

lix19937 commented 4 months ago

https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-853/api/c_api/classnvinfer1_1_1_i_network_definition.html#ae4a4d21b214bb82d5d441ba03590f726 https://github.com/NVIDIA/TensorRT/issues/2285