Open alexanderguzhva opened 2 months ago
seems to be a good suggestion. I would suggestion to remove any dependency on openblas if possible in the future.
/assign @liliu-z /unassign
We previously agreed on the following: The current stopgap measure is implemented in https://github.com/zilliztech/knowhere/pull/741. This PR sets all OpenMP parallelism to 1, shifting concurrency management from BLAS to Knowhere. The problem is that Knowhere only allocates a single thread from its pool for each search request on a single segment. With this temporary fix in place, if we perform sequential searches on a small number of segments, we end up with no/small parallelism in Knowhere, leading to bad performance. That said, this setup works fine when dealing with numerous segments or handling multiple concurrent search requests.
While modifying the library would be the ideal long-term solution, we still need to address some edge cases. For instance, how to protect the non-Docker users who might have an wrong OpenBLAS library in their environment and haven't run the install script beforehand.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
By default,
libopenblas
usespthreads
for multithreading. We useopenmp
in our search engine. Combined, this may lead toN^2
threads being spawned (each ofN
openmp
threads spawnsN
pthread
threads). For example, this issue https://github.com/zilliztech/knowhere/issues/739.In order to mitigate this, we need to use an
openmp
-compatible version ofopenblas
in Milvus Docker containers andinstall_deps
scripts. For ubuntu, it islibopenblas0-openmp
/libopenblas-openmp-dev
instead oflibopenblas
/libopenblas-dev
. Also, make sure thatlibopenblas0-pthread
/libopenblas-pthread-dev
are not installed.I insist that issues like https://github.com/zilliztech/knowhere/issues/739 were resolved incorrectly.
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response