Open hyln9 opened 3 months ago
Hi, I'm a little bit curious why the plan cache became 30x smaller, I refer to the doc, it seems that refitter is used to change engine weight dynamically. Thanks.
Hi, I'm a little bit curious why the plan cache became 30x smaller, I refer to the doc, it seems that refitter is used to change engine weight dynamically. Thanks.
The kSTRIP_PLAN
flag enables weight-stripping and works well with refitting at runtime.
Thanks for your work. I ran into a problem when compile it with TensorRT 10.1.0 . The CMakeLists.txt cannot read version number in NvInferVersion.h since it changed the encoding to utf16-le. Should I mod the CMakeLists.txt or do anything else?
Thanks for your work. I ran into a problem when compile it with TensorRT 10.1.0 . The CMakeLists.txt cannot read version number in NvInferVersion.h since it changed the encoding to utf16-le. Should I mod the CMakeLists.txt or do anything else?
It should be fixed now.
Hello!
As NVIDIA has finally released TensorRT 10.0 and made it publicly available on their website, I did some research on the now improved engine refitting API.
The result is very promising and the size of the plan cache is reduced by ~30x on my laptop. Support for the newer CUDA 12.x has been added as well.