triton-inference-server / onnxruntime_backend

The Triton backend for the ONNX Runtime.
BSD 3-Clause "New" or "Revised" License
134 stars 57 forks source link

build: Add WAR for CUDA 12.5 build issue #257

Closed rmccorm4 closed 5 months ago

rmccorm4 commented 5 months ago

Adding ORT patch from here: https://github.com/microsoft/onnxruntime/pull/20770

With build patch we used in a previous release here: https://github.com/triton-inference-server/onnxruntime_backend/commit/05d098d89f0455588fb85f0a22d97dd8ecc6cac9

In case the ORT release containing this fix does not match our timelines for getting r24.06 out.

rmccorm4 commented 5 months ago

@mc-nv Let me know if this should be targeting main + cherry-pick to r24.06, or targeting r24.06 directly if the changes are accepted.

tanmayv25 commented 5 months ago

Targeting r24.06 will be good enough. We can bring the fix to main once we start building main branches with cuda 12.5