jw3126 / ONNXRunTime.jl

Julia bindings for onnxruntime
MIT License
45 stars 9 forks source link

multi-thread friendly? #13

Closed Moelf closed 2 years ago

Moelf commented 2 years ago
julia> @time let input = Dict("dense_input" => randn(Float32, 1, 78))
           Threads.@threads for _ = 1:10000
               model(input)
           end
       end
ERROR: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:334 [inlined]
 [2] threading_run(func::Function)
   @ Base.Threads ./threadingconstructs.jl:38
 [3] macro expansion
   @ ./threadingconstructs.jl:97 [inlined]
 [4] macro expansion
   @ ./REPL[9]:2 [inlined]
 [5] top-level scope
   @ ./timing.jl:220 [inlined]
 [6] top-level scope
   @ ./REPL[9]:0

    nested task error: ArgumentError: array must be non-empty
    Stacktrace:
     [1] pop!
Moelf commented 2 years ago

if we remove all the @timed stuff it seems to work, I suggest remove all the timed stuff for production

jw3126 commented 2 years ago

Thanks for bringing this up. I agree we should just get rid of TimerOutputs. Would be awesome if you could make a PR!

Moelf commented 2 years ago

done

Moelf commented 2 years ago
julia> const model_cpu = load_inference("onnx_model_test.onnx");

julia> @time let input = Dict("dense_input" => randn(Float32, 1, 78))
           for _ = 1:10^5
               model_cpu(input)
           end
       end
  1.294845 seconds (5.10 M allocations: 1.068 GiB, 10.40% gc time)

julia> @time let input = Dict("dense_input" => randn(Float32, 1, 78))
           Threads.@threads for _ = 1:10^5
               model_cpu(input)
           end
       end
  1.035058 seconds (5.12 M allocations: 1.069 GiB, 14.82% gc time, 1.43% compilation time)

julia> Threads.nthreads()
10

actually, it doesn't scale anyway

Moelf commented 2 years ago

but at least this trick works:

julia> const models = [load_inference("./onnx_model_test.onnx") for _=1:Threads.nthreads()]

julia> @time let input = Dict("dense_input" => randn(Float32, 1, 78))
           Threads.@threads for _ = 1:10^5
               models[Threads.threadid()](input)
           end
       end
  0.367188 seconds (5.12 M allocations: 1.069 GiB, 38.03% compilation time)

somewhat

jw3126 commented 2 years ago

My guess is that onnxruntime does multithreading internally on CPU?