milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.39k stars 2.82k forks source link

[Feature]: replace std::regex with an alternative regex engine #31758

Open alexanderguzhva opened 5 months ago

alexanderguzhva commented 5 months ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

std::regex is notoriously slow and is a part of STL because it is a reference stable engine, not because of its performance.

Describe the solution you'd like.

Consider replacing std::regex with an alternative regex engine, such as boost::regex, Intel Hyperscan or RE2, if appropriate.

Check the benchmark numbers, for example https://github.com/HFTrader/regex-performance

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 5 months ago

@alexanderguzhva Agreed, I would prefer to use RE2. How much work is it?

alexanderguzhva commented 5 months ago

@xiaofan-luan in the best case scenario, the work is to brainlessly replace std::regex with re2 facilities and make sure that all unit tests pass. Also, it seems that Milvus uses the Rust library called 'tantivy', which also has some regex facilities, but I'm not familiar with the library.

xiaofan-luan commented 5 months ago

@xiaofan-luan in the best case scenario, the work is to brainlessly replace std::regex with re2 facilities and make sure that all unit tests pass. Also, it seems that Milvus uses the Rust library called 'tantivy', which also has some regex facilities, but I'm not familiar with the library.

So currently if user build tantivy index, we will tantivy to accelerate. Otherwise it is bruteforce search and we use re2.