milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.04k stars 2.88k forks source link

[Bug]: Use `libopenblas-openmp` instead of `libopenblas` #35526

Open alexanderguzhva opened 2 months ago

alexanderguzhva commented 2 months ago

Is there an existing issue for this?

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

By default, libopenblas uses pthreads for multithreading. We use openmp in our search engine. Combined, this may lead to N^2 threads being spawned (each of N openmp threads spawns N pthread threads). For example, this issue https://github.com/zilliztech/knowhere/issues/739.

In order to mitigate this, we need to use an openmp-compatible version of openblas in Milvus Docker containers and install_deps scripts. For ubuntu, it is libopenblas0-openmp / libopenblas-openmp-dev instead of libopenblas / libopenblas-dev. Also, make sure that libopenblas0-pthread / libopenblas-pthread-dev are not installed.

I insist that issues like https://github.com/zilliztech/knowhere/issues/739 were resolved incorrectly.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

xiaofan-luan commented 2 months ago

seems to be a good suggestion. I would suggestion to remove any dependency on openblas if possible in the future.

yanliang567 commented 2 months ago

/assign @liliu-z /unassign

liliu-z commented 2 months ago

We previously agreed on the following: The current stopgap measure is implemented in https://github.com/zilliztech/knowhere/pull/741. This PR sets all OpenMP parallelism to 1, shifting concurrency management from BLAS to Knowhere. The problem is that Knowhere only allocates a single thread from its pool for each search request on a single segment. With this temporary fix in place, if we perform sequential searches on a small number of segments, we end up with no/small parallelism in Knowhere, leading to bad performance. That said, this setup works fine when dealing with numerous segments or handling multiple concurrent search requests.

While modifying the library would be the ideal long-term solution, we still need to address some edge cases. For instance, how to protect the non-Docker users who might have an wrong OpenBLAS library in their environment and haven't run the install script beforehand.

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.