roboflow / inference

A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
https://inference.roboflow.com
Other
1.15k stars 85 forks source link

Allows for custom providers to be passed #369

Open Qfc9 opened 2 months ago

Qfc9 commented 2 months ago

Description

There is an existing issue within the current codebase, highlighted by warnings generated for numerous users. The default inference providers derived from ONNX are inaccurately configured. This issue predominantly affects the OnnxRoboflowInferenceModel, leading to a scenario where only the CPU is utilized for computations even when GPU or TensorRT packages are available.

Type of Change

Please remove options that are not applicable.

Testing and Validation

This modification has been tested via local installation and execution of the package. Post-implementation, the previous warnings have been eliminated, and the package performs efficiently, provided that custom providers are specified. Apart from this adjustment, the functionality of the code remains consistent with prior outputs.

CLAassistant commented 2 months ago

CLA assistant check
All committers have signed the CLA.

yeldarby commented 2 months ago

Hi @Qfc9, the default is to use the CUDA execution provider, then fallback to OpenVINO, then fallback to CPU if those are not available. This can be overridden by setting the ONNXRUNTIME_EXECUTION_PROVIDERS environment variable.

highlighted by warnings generated for numerous users

What warning are you seeing?

leading to a scenario where only the CPU is utilized for computations even when GPU or TensorRT packages are available

We purposely do not default to the TensorRT EP because it has an extremely slow compilation step (upwards of 20 minutes on some machines) before you can get your first inference results back. This tradeoff of perverse cold-start time with a bit faster inference is a non-starter for many real-world use-cases. But the default can be overridden using the environment variable.

Qfc9 commented 2 months ago

TLDR; The Docs are bad. Not obvious on how to make a providers change. You have a method (ENV Vars), but this PR can also help and doesn't stop the old method. We can update the docs if yall need help with that to make ENV vars more obvious.

The following warning is usually given to users because OpenVINO is apart of the execution list even though they may not have it.

C:\Users\elija\AppData\Local\Programs\Python\Python311\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:65: UserWarning: Specified provider 'OpenVINOExecutionProvider' is not in available provider names.Available providers: 'TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider' warnings.warn(

Not an issue, but it confusing to new users. Additionally editing the ONNXRUNTIME_EXECUTION_PROVIDERS is not really a feasible or understandable concept to most people because there is little to no documentation on the matter. There is just the one reference in the docker side of the docs (https://inference.roboflow.com/quickstart/docker/#advanced-build-a-docker-container-from-scratch). But i mean that looks like the issue with all the ENV Vars, none of them look to be documented on the docs.

I didn't even realize that was a thing until now, tbh. This potentially negates the change of the PR that I submitted. The only argument I would still make for the change is that using the old ENV Var method still works. This just also makes it more accessible and understandable to other devs who are more used to using Onnx and passing in providers via named args.

At a minimum, we could look at instead updating the docs if that is a feasible thing from the repo and then sending a PR.

We are not in favor of having tensorrt as a default item. You are 100% correct in the current set up. Sure tensorrt takes 20 minutes for exporting on the first run, but alot of our customers are fine with that since there is a significate boost to performance from it. This can turn a model from near real time, to truly real time.

Additionally, this allows customers to be able to run potentially more models at once on the same hardware since of the optimizations.

Qfc9 commented 2 months ago

Below is supporting images to the current doc issue. Most individuals will not see an option, may see the one function but there isn't any immediately useful information on the page. I think it's unreasonable to expect most devs to do any more research than that. We are an example of that.

Again, we would be more than happy to do some updates to the docs if yall would prefer that PR compared to the one we just submitted.

image image image

yeldarby commented 2 months ago

Definitely agree our docs are lacking here. Would definitely accept a PR to make this more clear.

Having two competing ways to do set these values seems not ideal. Via environment variables is easier when trying to configure via Docker w/o having to have different code for different environments but can definitely see how if one is using the package directly in Python it'd be useful to be able to do it in the code.

@PawelPeczek-Roboflow @grzegorz-roboflow what do you think?

grzegorz-roboflow commented 2 months ago

I'm proponent of verbose way of passing all required args so all are available in the top method signature, so reading signature should provide all information for user to be able to reason about the method. I would parse envs in the top layer and pass them as arg rather than read envs somewhere deep below in the n-th layer.

DominiquePaul commented 2 weeks ago

Whats the status of this? :)

These warnings are very annoying when running the code in a loop. I also don't want to set environment vars when I'm not using any others just to be able to call the function without warnings. Wouldn't the easiest thing to do to just be able to set this as a function parameter when calling model.inference?