Closed mszuyx closed 2 years ago
Hi @mszuyx ,
Generally, VPU/MYRIAD/NCS2 could work as long as you fulfill the OpenVINO system requirements (software and hardware) and if the model & its topology is officially supported. The VPU plugin should work as expected.
However, bear in mind that the model IR must be in FP16 format in order to be inference with VPU. For testing purposes, you can convert the model into FP16 IR using Model Optimizer by feeding the parameter --data_type {FP16,FP32,half,float} For example: python3 mo_tf.py --input_model alexnet.xml --output_dir --data_type FP16
Next, infer it with the Benchmark Tool with -d MYRIAD parameter: python3 benchmark_app.py -m -i -d MYRIAD
Another option to test your model is by using DevCloud. You may access it here.
The computing power of VPU is not as powerful if compared with CPU but in your use case, this device would definitely help to offload some burden from CPU. You can consider using the OpenVINO Multi-Device plugin where it automatically assigns inference requests to available computational devices to execute the requests in parallel. This cause the devices to share the inference burden and results in consistent performance.
Meanwhile, for deployment purposes, you can use the OpenVINO Deployment Manager. This would create a runtime package (deployment package) for your target device by assembling the model, IR files, your application, and associated dependencies.
@Iffa-Meah Thank you! These are all very helpful!
So the takeaway messages are: (correct me if I got them wrong?)
The other thing I noticed is, if I select GPU as the inference device (I am on Ubuntu 20.04 and python 3.8 API), it seems to take a much longer time for the model to initialize (I am guessing the CPU is busy loading the model into GPU memory? Because I see the CPU usage during this period is almost 100%). Is this expected?
Also, I notice if I build a naive benchmark test (use a for loop to test the average inference FPS), the result is inconsistent on the single-board computer (the larger iteration number I choose, the smaller the FPS). When I test with 30 iters, the FPS is 44, but setting it to 120 will sometimes give me 20 Hz. But this is not an issue for the desktop, which has better single-core processing power and way better multi-core power. I suspect the longer the test time, the more likely CPU would get distracted from other tasks? Would changing my code from python API to C++ gives the model higher priority? Or this theory is simply not true?
Thanks!
[...] The other thing I noticed is, if I select GPU as the inference device (I am on Ubuntu 20.04 and python 3.8 API), it seems to take a much longer time for the model to initialize (I am guessing the CPU is busy loading the model into GPU memory? Because I see the CPU usage during this period is almost 100%). Is this expected? [...]
The delay is because while loading/importing the model into the plugin, for GPU (and some VPUs) OpenVINO compiles OpenCL-kernels for contained tensors&operations.
With using an ENV variable like export cl_cache_dir=/my/path/to/a/folder/
those OpenCL-kernel files could be stored (temporarily or permanently) and will be loaded the next time instead of being compiled again and again.
[...] The other thing I noticed is, if I select GPU as the inference device (I am on Ubuntu 20.04 and python 3.8 API), it seems to take a much longer time for the model to initialize (I am guessing the CPU is busy loading the model into GPU memory? Because I see the CPU usage during this period is almost 100%). Is this expected? [...]
The delay is because while loading/importing the model into the plugin, for GPU (and some VPUs) OpenVINO compiles OpenCL-kernels for contained tensors&operations.
With using an ENV variable like
export cl_cache_dir=/my/path/to/a/folder/
those OpenCL-kernel files could be stored (temporarily or permanently) and will be loaded the next time instead of being compiled again and again.
Sounds great! Is there somewhere I can read more about it?
Yu
I found some info here:
https://github.com/intel/compute-runtime/blob/master/opencl/doc/FAQ.md
I think the loading is much faster now!
Thx
Hi,
Great project first of all! Big fan of the project!
I am currently working on deploying a custom model on a single board computer called up xtreme: https://up-board.org/up-xtreme/ (the i7 version). The model itself can be converted into intel IR from ONNX and be executed via openvino runtime on our desktop.
Our questions are:
Thanks! : )