replicate / cog-triton

A cog implementation of Nvidia's Triton server
Apache License 2.0
11 stars 0 forks source link

tensorrt-llm: 0.9 -> 0.10, triton: 2.42.0 -> 2.44.0 #50

Open yorickvP opened 1 month ago

yorickvP commented 1 month ago

Open questions:

technillogue commented 4 weeks ago

enable_trt_overlap is set to false in a lot of places, we will probably need to change that

we should review the new configuration options as well