Open lix19937 opened 6 months ago
对于 输入为fixed shape,而内部出现 nonzero 操作的一般认为是局部dynamic
, currently skipped for dynamic shapes
个人理解:
局部区域按dynamic shape进行优化,而其他fixed shape区域进行常规优化,待验证
cudagraph 的支持可能需要修改init 逻辑
look into the model to check if the NonZero can be replaced, sometimes onnx exporter just try to generate some indices to embedding layer, it will also use NoneZero, in this case it can be replaced with Range
https://github.com/NVIDIA/TensorRT/issues/527#issuecomment-789359802
tensor mask op,as example:x = x[x.sum(dim=-1) > 0]
torch.where(condition), as example: torch.where(x>0)
torch.nonzero(x)
https://docs.nvidia.com/deeplearning/tensorrt/sample-support-guide/index.html#sampleNonZeroPlugin