Closed fj-y-saito closed 7 months ago
@fj-y-saito For ARM platform, OpenVINO only support TBB as default thread library now.
Could you remove the OpenMP modification steps and retry the test?
Thank you for your reply. I'm sorry for my bad description. I shouldn't write about OpenMP. It is not related to this issue.
I modified the TBB route in the oneDNN source code, but I didn't use the OpenMP directive. I also didn't specify the build option for OpenMP. I reference the OpenMP implementation only to know how to make the TBB route in oneDNN. And this trouble is occurred whether my modification is applied or not.
@fj-y-saito When you build OpenVINO from source, OpenVINO cmake will set parameter "THREADING" to TBB or OMP and pass this setting to oneDNN cmake.
oneDNN already support TBB and OMP. Therefore no additional modifications to the TBB route are required.
Yes, I knew about the THREADING parameter. Also, I confirmed a parameter value was set to TBB by default in my environment.
If OpenVINO sets value "TBB" to the THREADING parameter, DNNL_RUNTIME_TBB macro should be defined in oneDNN. However, I could not find TBB code from the below. https://github.com/openvinotoolkit/oneDNN/blob/1c7bfabf1b26e6fb95fea1613e1d3d2bef1f6b54/src/cpu/cpu_engine.hpp#L170 In my understanding, this code will change the number of thread for OMP or POSIX threading library.
So, I wrote the code for TBB expected to be called from the Python API compile_model
.
My modification works fine and success to resize the number of ACL thread to the specified number at first.
However, the number of thread was changed to the number of CPU cores unexpectedly in subsequent process.
Also, I found that the static variable _scheduler
in the ACL source code was reinitialized.
I think this may be a clue to understand this issue.
@wangleis Is there any update on this?
@fj-y-saito OpenVINO works with TBB in Linux on ARM platform by default.
Since your question is about the configuration in OneDNN and ACL, not the configuration in OpenVINO, we need more time to investigate and confirm.
@fj-y-saito The original issue Even though I specified INFERENCE_NUM_THREADS to 1, OpenVINO creates as many threads as the number of CPU cores. This probrem does not occur on x86 CPU. It happens only on ARM CPU.
had been fixed. Could you try the latest release?
@wangleis I confirmed that this trouble does not occur in new version. Thank you very much for your support.
Tis issue was solved. So I close this issue.
OpenVINO Version
2023.0.1
Operating System
Ubuntu 20.04 (LTS)
Device used for inference
CPU
Framework
PyTorch
Model used
https://docs.openvino.ai/2023.2/omz_models_model_person_reidentification_retail_0277.html
Issue description
Even though I specified INFERENCE_NUM_THREADS to 1, OpenVINO creates as many threads as the number of CPU cores. This probrem does not occur on x86 CPU. It happens only on ARM CPU. At x86, I could change the number of threads same way. I found that there is no route of TBB in ARM compile switch at oneDNN source, so I modified source code with reference to OpenMP route [*1]. I think default thread library of OpenVINO is TBB, so this route is necessary. But the behavior is not what I expected. It acts like below.
I watched _schedulers value[*3] which manage thread number of Arm Comput Library on gdb. At the end of model compilation process, it seems that the _schedulers value disappers and can not access. This is why threads was created again, but I don't know why it was happen.
Machines Used
Other Environment
The python version is 3.10.12. The docker image is nvcr.io/nvidia/pytorch: 23.06-py3.
OpenVINO Build Options
-DENABLE_INTEL_GPU=OFF -DENABLE_PYTHON=ON -DENABLE_WHEEL=ON -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS=-fvisibility=default -DDNNL_USE_ACL=ON -DARM_COMPUTE_TARGET_ARCH=armv8.2-a
Step-by-step reproduction
Relevant log output
At initialization of _schedulers, threads as same number of CPU cores is created.
Of course the _schedulers value was set.
After that, thread number was reset to what I specified. But somehow schedulers was disappear.
It was happen under compile_model process.
Then thread is created at execution process, because _schedulers is empty and initialized again.