Closed Moelf closed 2 years ago
if we remove all the @timed
stuff it seems to work, I suggest remove all the timed stuff for production
Thanks for bringing this up. I agree we should just get rid of TimerOutputs. Would be awesome if you could make a PR!
done
julia> const model_cpu = load_inference("onnx_model_test.onnx");
julia> @time let input = Dict("dense_input" => randn(Float32, 1, 78))
for _ = 1:10^5
model_cpu(input)
end
end
1.294845 seconds (5.10 M allocations: 1.068 GiB, 10.40% gc time)
julia> @time let input = Dict("dense_input" => randn(Float32, 1, 78))
Threads.@threads for _ = 1:10^5
model_cpu(input)
end
end
1.035058 seconds (5.12 M allocations: 1.069 GiB, 14.82% gc time, 1.43% compilation time)
julia> Threads.nthreads()
10
actually, it doesn't scale anyway
but at least this trick works:
julia> const models = [load_inference("./onnx_model_test.onnx") for _=1:Threads.nthreads()]
julia> @time let input = Dict("dense_input" => randn(Float32, 1, 78))
Threads.@threads for _ = 1:10^5
models[Threads.threadid()](input)
end
end
0.367188 seconds (5.12 M allocations: 1.069 GiB, 38.03% compilation time)
somewhat
My guess is that onnxruntime does multithreading internally on CPU?