jw3126 / ONNXRunTime.jl

Julia bindings for onnxruntime
MIT License
44 stars 9 forks source link
juila onnx onnxruntime

ONNXRunTime

Stable Dev Build Status Coverage

ONNXRunTime provides inofficial julia bindings for onnxruntime. It exposes both a low level interface, that mirrors the official C-API, as well as an high level interface.

Contributions are welcome.

Usage

The high level API works as follows:


julia> import ONNXRunTime as ORT

julia> path = ORT.testdatapath("increment2x3.onnx"); # path to a toy model

julia> model = ORT.load_inference(path);

julia> input = Dict("input" => randn(Float32,2,3))
Dict{String, Matrix{Float32}} with 1 entry:
  "input" => [1.68127 1.18192 -0.474021; -1.13518 1.02199 2.75168]

julia> model(input)
Dict{String, Matrix{Float32}} with 1 entry:
  "output" => [2.68127 2.18192 0.525979; -0.135185 2.02199 3.75168]

For GPU usage the CUDA and cuDNN packages are required and the CUDA runtime needs to be set to 11.8 or a later 11.x version. To set this up, do

pkg> add CUDA cuDNN

julia> import CUDA

julia> CUDA.set_runtime_version!(v"11.8")

Then GPU inference is simply

julia> import CUDA, cuDNN

julia> ORT.load_inference(path, execution_provider=:cuda)

CUDA provider options can be specified

julia> ORT.load_inference(path, execution_provider=:cuda,
                          provider_options=(;cudnn_conv_algo_search=:HEURISTIC))

Memory allocated by a model is eventually automatically released after it goes out of scope, when the model object is deleted by the garbage collector. It can also be immediately released with release(model).

The low level API mirrors the offical C-API. The above example looks like this:

using ONNXRunTime.CAPI
using ONNXRunTime: testdatapath

api = GetApi();
env = CreateEnv(api, name="myenv");
so = CreateSessionOptions(api);
path = testdatapath("increment2x3.onnx");
session = CreateSession(api, env, path, so);
mem = CreateCpuMemoryInfo(api);
input_array = randn(Float32, 2,3)
input_tensor = CreateTensorWithDataAsOrtValue(api, mem, vec(input_array), size(input_array));
run_options = CreateRunOptions(api);
input_names = ["input"];
output_names = ["output"];
inputs = [input_tensor];
outputs = Run(api, session, run_options, input_names, inputs, output_names);
output_tensor = only(outputs);
output_array = GetTensorMutableData(api, output_tensor);

Alternatives

Complements

Breaking Changes in version 0.4.

Setting the CUDA Runtime Version in Tests

For GPU tests using ONNXRunTime, naturally the tests must depend on and import CUDA and cuDNN. Additionally a supported CUDA runtime version needs to be used, which can be somewhat tricky to set up for the tests.

First some background. What CUDA.set_runtime_version!(v"11.8") effectively does is to

  1. Add a LocalPreferences.toml file containing
[CUDA_Runtime_jll]
version = "11.8"
  1. In Project.toml, add
    [extras]
    CUDA_Runtime_jll = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"

If your test environment is defined by a test target in the top Project.toml you need to

  1. Add a LocalPreferences.toml in your top directory with the same contents as above.

  2. Add CUDA_Runtime_jll to the extras section of Project.toml.

  3. Add CUDA_Runtime_jll to the test target of Project.toml.

If your test environment is defined by a Project.toml in the test directory, you instead need to

  1. Add a test/LocalPreferences.toml file with the same contents as above.

  2. Add CUDA_Runtime_jll to the extras section of test/Project.toml.