Closed kallebysantos closed 3 weeks ago
π· It's also a base foundation for an owned onnx runtime that can be integrated direclty with https://github.com/huggingface/transformers.js/pull/947 library, that will allow a better inference without the needed of coupling models like gte-small on edge-runtime image.
If so, are you also preparing other PR after this PR has been merged? Overall, the PR looks good. π
Anyway, I'll be testing this locally with k6 soon. If there are any issues, I'll let you know. π
What kind of change does this PR introduce?
Feature, Enhancement
What is the current behavior?
model sessions are eager evaluated and not survive cross worker life-cycles
What is the new behavior?
This PR introduces shared sessions logic and more other ort improvements like GPU support and optimizations.
Tester docker image:
You can get a docker image of this PR from docker hub:
Session lifecycle:
This PR introduces a
Lazy
map ofort:Sessions
, it means that sessions will be loaded once and then shared between worker cycles.Cleaning up sessions: Each
ort:Session
is attached to anArc
smart pointer and will only be dropped if no consumer is attached to it, but in order to that users must explicit call theEdgeRuntime.ai.tryCleanupUnusedSession()
method.GPU Support:
The
gpu
support allowssession
inference in specialized hardware and its backed with CUDA. There is no configuration to do by the final user, just call theSession
forgte-small
. But in order to enablegpu
inference theDockerfile
now has two mainbuild stages
(That should be specified duringdocker build
) :edge-runtime (CPU only): This stage builds the default
edge-runtime
, whereort::Session
's are loaded using CPU.edge-runtime-cuda (GPU/CPU): This stage builds the default
edge-runtime
in anvidia/cuda
machine that allows loading usingGPU
orCPU
(as fallback).Each stage needs to install the appropriated
onnx-runtime
. So in order that, theinstall_onnx.sh
has updated with a 4ΒΊ parameter flag--gpu
, that will download acuda
version from the officialmicrosoft/onnxruntime
repository.Using GPU image:
In order to use the
gpu
image thedocker-compose
file must include the following properties for thefunctions
service:Final considerations:
Like I'd describe before, this is an adapted work from #368 where we spitted out only the core features that improves ort support for
edge-runtime
.Finally, thanks for @nyannyacha that help me a loot π