Drastic reduction in trt plan cache size

lightvector / KataGo

GTP engine and self-play learning in Go

https://katagotraining.org/

Other

3.49k stars 564 forks source link

Drastic reduction in trt plan cache size #946

Open hyln9 opened 3 months ago

hyln9 commented 3 months ago

Hello!

As NVIDIA has finally released TensorRT 10.0 and made it publicly available on their website, I did some research on the now improved engine refitting API.

The result is very promising and the size of the plan cache is reduced by ~30x on my laptop. Support for the newer CUDA 12.x has been added as well.

inisis commented 3 months ago

Hi, I'm a little bit curious why the plan cache became 30x smaller, I refer to the doc, it seems that refitter is used to change engine weight dynamically. Thanks.

hyln9 commented 3 months ago

Hi, I'm a little bit curious why the plan cache became 30x smaller, I refer to the doc, it seems that refitter is used to change engine weight dynamically. Thanks.

The kSTRIP_PLAN flag enables weight-stripping and works well with refitting at runtime.

ActiveIce commented 3 months ago

Thanks for your work. I ran into a problem when compile it with TensorRT 10.1.0 . The CMakeLists.txt cannot read version number in NvInferVersion.h since it changed the encoding to utf16-le. Should I mod the CMakeLists.txt or do anything else?

hyln9 commented 3 months ago

Thanks for your work. I ran into a problem when compile it with TensorRT 10.1.0 . The CMakeLists.txt cannot read version number in NvInferVersion.h since it changed the encoding to utf16-le. Should I mod the CMakeLists.txt or do anything else?

It should be fixed now.