triton-inference-server / onnxruntime_backend

The Triton backend for the ONNX Runtime.
BSD 3-Clause "New" or "Revised" License
123 stars 54 forks source link

Allow Usage of Intel oneDNN EP For ONNX Backend #133

Open narolski opened 2 years ago

narolski commented 2 years ago

Is your feature request related to a problem? Please describe. I would like to use the Intel oneDNN Execution Provider (EP) in ONNX Runtime built for Triton Inference Server ONNX Backend.

Describe the solution you'd like Ideally, the oneDNN EP should be enabled the same way we can enable the usage of OpenVino EP in model configuration:

optimization {
  execution_accelerators {
    cpu_execution_accelerator : [ {
      name : "openvino"
    } ]
  }
}

Describe alternatives you've considered I've tried to pass dnnl under cpu_execution_accelerator, but this is not supported.

oneDNN might yield greater performance improvements for CPU inference than OpenVino, that is why it would be great to be able to use it within the Triton Inference Server.

Update: Furthermore, it seems that onednn is enabled by default for ONNX Runtime wheel built with onednn over the default ONNX Runtime CPU Execution Provider:

When using the python wheel from the ONNX Runtime built with DNNL execution provider, it will be automatically prioritized over the CPU execution provider. Python APIs details are here.

Additional context ONNX Runtime documentation: https://fs-eire.github.io/onnxruntime/docs/execution-providers/oneDNN-ExecutionProvider.html

narolski commented 2 years ago

@pranavsharma Do you think it will be possible to implement this configuration option?

pranavsharma commented 2 years ago

We've not planned for it yet. Would you like to contribute?