triton-inference-server / onnxruntime_backend

The Triton backend for the ONNX Runtime.
BSD 3-Clause "New" or "Revised" License
123 stars 54 forks source link

onnx disabled optimizers for dropout #158

Open zhaozhiming37 opened 1 year ago

zhaozhiming37 commented 1 year ago

My model includes Dropout module for inference, and when I run my model by onnxruntime locally, I will set disabled_optimizers=["EliminateDropout"]. And I want to know how can I do that by triton server? My code is like this:

session = ort.InferenceSession(
        onnxFile,
        disabled_optimizers=["EliminateDropout"],
        providers=[
            'TensorrtExecutionProvider',
            # 'CUDAExecutionProvider',
            # 'CPUExecutionProvider'
        ]
    )
Tabrizian commented 1 year ago

I don't know whether we expose this parameter in the model configuration. @pranavsharma Do you know whether it is possible to use this option when serving the models using onnxruntime backend?

zhaozhiming37 commented 1 year ago

@pranavsharma Hi, is there an update on my question?

zhaozhiming37 commented 1 year ago

@Tabrizian Hi, I wonder if there is an answer to the question or if you have plans to support?

Tabrizian commented 1 year ago

Hi @zhaozhiming37, sorry for the delayed resposne. ONNXRuntime backend is managed by the Microsoft team so they should be able to provide more info.

pranavsharma commented 1 year ago

This has not been exposed yet. The best way to do this is to create the session offline and serialize it and then use the serialized ONNX model in Triton.