POC: integrate Rust ML library with Manticore Search C++ code

sanikolaev commented 7 months ago

As discussed in the dev call on April 18, 2024, we'd like to integrate an ML library written in Rust with our Manticore Search code. Before proceeding, we'd like to experiment with using a Rust library in C++ code in principle. This task is to conduct the experiment.

Checklist

^{To be completed by the assignee. Check off tasks that have been completed or are not applicable.}

- [x] Task estimated - [x] Bug reproduced - [x] Specification created, reviewed and approved - [x] Implementation completed - [x] Tests developed - [x] Documentation updated - [x] Documentation proofread - [x] Changelog updated - [x] OpenAPI YAML updated and issue created to rebuild clients

AbstractiveNord commented 7 months ago

Any help required?

donhardman commented 7 months ago

In short, we want to build a Rust library that can be used from the Manticore daemon source code. This library should have one simple function - converting text to a vector. As a proof of concept, it can be a basic implementation that exposes just one function. This function should accept a list of characters or some native C++ data type and return a vector of floats. Everything can be hardcoded for now.

The base code and libraries to use can be taken from this PHP extension: https://github.com/manticoresoftware/php-ext-model

If you can help with a simple implementation to validate this concept and benchmark it with the Manticore daemon, that would be great. We appreciate your assistance.

AbstractiveNord commented 7 months ago

In short, we want to build a Rust library that can be used from the Manticore daemon source code. This library should have one simple function - converting text to a vector. As a proof of concept, it can be a basic implementation that exposes just one function. This function should accept a list of characters or some native C++ data type and return a vector of floats. Everything can be hardcoded for now.

The base code and libraries to use can be taken from this PHP extension: https://github.com/manticoresoftware/php-ext-model

If you can help with a simple implementation to validate this concept and benchmark it with the Manticore daemon, that would be great. We appreciate your assistance.

Just want to make it clear as possible. You want some DLL, which expose one function. What's text argument of that function? Is input text std string or not?

donhardman commented 7 months ago

In short, we want to build a Rust library that can be used from the Manticore daemon source code. This library should have one simple function - converting text to a vector. As a proof of concept, it can be a basic implementation that exposes just one function. This function should accept a list of characters or some native C++ data type and return a vector of floats. Everything can be hardcoded for now. The base code and libraries to use can be taken from this PHP extension: https://github.com/manticoresoftware/php-ext-model If you can help with a simple implementation to validate this concept and benchmark it with the Manticore daemon, that would be great. We appreciate your assistance.

Just want to make it clear as possible. You want some DLL, which expose one function. What's text argument of that function? Is input text std string or not?

Absolutely, a shared library (DLL or so for Linux). This library should be usable within C++ code. The interface can accept std::string or just a pointer to a list of chars, whichever is the most effective way possible.

AbstractiveNord commented 7 months ago

OK, then std::string into vector of 32bit floats. I am not sure about dynamic linking since all my experiments used static linking.

tomatolog commented 7 months ago

could be better to pass const char * and length or string_view as we do not use any std containers and pass std::string means allocation and copy from plain const char *

AbstractiveNord commented 7 months ago

could be better to pass const char * and length or string_view as we do not use any std containers and pass std::string means allocation and copy from plain const char *

Is your strings not null terminated? Is string not UTF-8 correct?

donhardman commented 6 months ago

Hey @AbstractiveNord, thanks a bunch for this pull request! We actually already built a separate library, and your contribution will definitely help us streamline the integration process.

Here's the library we're looking to use in Manticore: https://github.com/manticoresoftware/manticoresearch-text-embeddings

There are a few limitations we'd like to address:

Multi-CPU Usage: As we discussed, the library currently uses multiple CPUs for text embedding. We should probably limit this in the future.
Model Download: We have an internal ecosystem that automatically downloads the required model on the first call. We need to keep this in mind when benchmarking.
Interface: We might need to tweak the interface to accept a path to a specific model. Alternatively, we could keep the same interface but set a root directory for model lookup, handling the download in C++ instead of Rust. Another option is implementing an initialization method call on the Rust side.

Take a look at the examples folder – it has some samples demonstrating how to use C to call the library, along with benchmarks. The headers also include build instructions.

Building the library is simple: just install Rust and use cargo:

cargo build --lib --release

The dynamic library will be located in the target/release folder after building.

Regarding the library itself, we're seeing almost no overhead in terms of time, but this still needs further validation. I've added some tests for this purpose.

Trying to integrate it now sounds like a solid plan. Let's also do some profiling to get a better grasp of the overhead, not just in terms of time but also memory usage. I tried sticking with native code as much as possible, but since we're using an external library, we'll need to convert input for internal types. Otherwise, we'd have to rebuild the external libraries, which isn't ideal. The good news is that benchmarking shows minimal overhead.

So, we end up returning a *const f32, which is a native pointer. Keep in mind that Rust handles memory differently, so if we forget about this pointer, it's on the C side to clean things up using free. Otherwise, we might run into memory leaks.

donhardman commented 6 months ago

As discussed before here are points for consideration:

Memory Allocation: Allocate the vector in C and pass it as a parameter to Rust.
Model Initialization: Implement an initialization method to download the model (currently using lazy loading). This method should accept a path as an optional parameter.
Interface Update: Modify the interface to utilize ModelPtr as a void pointer instead of the TextEmbeddings wrapper.
Memory Management: Introduce methods to call the black-boxed model and subsequently free up all memory it uses.

sanikolaev commented 6 months ago

To resume, what @donhardman has done proves the rust library can work in C, but it doesn't have anything to do with Manticore. As discussed on yesterday's call, the next step is to implement some DEBUG embedding command in Manticore Search to prove the concept to the point when we understand that it works with Manticore. What @AbstractiveNord has done here https://github.com/manticoresoftware/manticoresearch/pull/1148/files may be helpful.

tomatolog commented 3 months ago

we could try to use llama.cpp to skip rust marshaling. It supports huge list of different models and here is a example of embedding in cpp

klirichek commented 1 month ago

I've pushed branch https://github.com/manticoresoftware/manticoresearch-text-embeddings/tree/cpp_bind to the rust lib. Also I've pushed branch 'embeddings' into manticore source tree. Both branch should work together.

On rust side - 'cargo build --lib', or 'cargo build --lib --release'. On manticore side:

mysql> debug load embeddings '/opt/work/manticoresearch-text-embeddings/target/release/libmanticoresearch_text_embeddings.dylib';
+-----------------------+--------+
| command               | result |
+-----------------------+--------+
| debug load embeddings | Ok     |
+-----------------------+--------+
1 row in set (0,01 sec)

mysql> debug load model 'sentence-transformers/multi-qa-MiniLM-L6-cos-v1';
+------------------+-------------------------------------+
| command          | result                              |
+------------------+-------------------------------------+
| debug load model | hidden_size=384, max_input_size=512 |
+------------------+-------------------------------------+
1 row in set (0,07 sec)

mysql> debug embeddings 'This is a sample text.';
+-----------+-------------+
| embedding | value       |
+-----------+-------------+
| 0         | -0.01043452 |
| 1         | 0.06928102  |
| 2         | -0.04854530 |
| 3         | -0.00591148 |
| 4         | 0.03279826  |
| 5         | 0.02965242  |
| 6         | 0.07039508  |
| 7         | 0.00784686  |
...
| 380       | 0.12422077  |
| 381       | 0.10335055  |
| 382       | 0.11339451  |
| 383       | -0.00519720 |
+-----------+-------------+
384 rows in set (0,03 sec)

Names of the commands are arbitrary for POC; that is just for experimenting. Also, header file is inlined into c++ code, see beginning of 'src/embeddings/embeddings.cpp'. Path to the lib for 'debug load embeddings' should be actualized on your instance.

sanikolaev commented 1 month ago

@donhardman pls test it and prepare a further plan of action.

AbstractiveNord commented 1 month ago

Do you plan optionally utilize GPU by manticoresearch-text-embeddings? Just interesting about throughput of CPU inference.

sanikolaev commented 1 month ago

Do you plan optionally utilize GPU by

It depends on the library we'll be using. So far we are going to use https://github.com/huggingface/candle . They say:

Candle is a minimalist ML framework for Rust with a focus on performance (including GPU support)

so it should be possible to use GPU.

donhardman commented 1 month ago

To move forward, we should discuss and approve the interface for configuring fields that will have auto-embedding.

Currently, I suggest starting with the following specification:

fields = "field1, field2": fields we should use as the source for text to generate embeddings
model_name = "openai/..." or model_name = "sentence-transformers/..." for local models
model_cache_path = "...": path to the cache directory where we will store everything in case of local model usage
api_key = "...": for cases where we use a remote API for embeddings
use_gpu = 1|0: for cases where we use a local model and need to use GPU (if available), default is always CPU

It may look like this:

CREATE TABLE test (
    title TEXT,
    image_vector FLOAT_VECTOR KNN_TYPE='hnsw' KNN_DIMS='4' HNSW_SIMILARITY='l2'
    MODEL_NAME = "..."
    MODEL_CACHE_PATH = "..."
);

I've also refactored the code and updated the interface. It's subject to review and discussion on whether we should proceed with it or not: https://github.com/manticoresoftware/manticoresearch-text-embeddings

Things to consider:

We should discuss how to handle errors from the library on the C++ side, so we can adjust the code and implement best practices.
Multi-threading: Currently, we use ALL CPUs when generating embeddings, which may be heavy. We need to investigate if we can control this.
Batching: Do we need a method in the library to get multiple embeddings?
How to handle long texts: Currently, we calculate the mean vector, but this may not be the best default. Should we have options for this?

donhardman commented 1 month ago

@klirichek please review my changes in the interface and let me know if it's all OK

tomatolog commented 1 month ago

maybe if api_key and model_name set for remote model it needs additional validation at the CREATE TABLE that parameters are good and remote API accepts them - to fail CREATE TABLE if the user needs openai model but the internet is not available or api_key is wrong.

Maybe such check also needs at the daemon start or daemon restart after crash to disable such index or put it into read-only modex.

AbstractiveNord commented 1 month ago

Also notice please that's api_keys is secret and dynamic information, can change frequently.

donhardman commented 1 month ago

As discussed before, I have split this task into multiple ones:

Closing this issue as the Proof of Concept (POC) is done.

manticoresoftware / manticoresearch

POC: integrate Rust ML library with Manticore Search C++ code #2074

Checklist