Open sanikolaev opened 8 months ago
So much this. This is what I'm working on right now.
This would be very cool to have built-in.
Though I'd just like to put in a good word for txtai, a Python library/application for all sorts of nlp, vector database etc... https://neuml.github.io/txtai/
It's very easy to use and quite flexible, whereas some of its "competitors" are much more opinionated about how things need to be done.
I'll likely have it set up as an nlp/embeddings processors/server and then it's output will be stored in manticore and some others data stores.
great if Manticore could do it automatically. It's worth checking if Manticore Search can be integrated with https://github.com/microsoft/onnxruntime/ or another similar library.
You might want to look at typesense (similar search engine provider) on how they have integrated embedding generation [1] with their search including model selection etc.
@donhardman as discussed on today's call, pls write down the suggested architecture of this functionality.
As an idea, I think it's a good approach to go for a .so library written in Rust while reusing what we've already learned by introducing it in our GitHub issue search demo and calling this function from the C code.
In that case, we'll have a function that will be used and utilized by Rust and shipped in the same way we do with columnar, and the C code of the daemon will call this function when needed to generate auto embeddings.
It sounds easy to implement since we already have everything we need.
The goal is to adapt the CandleML framework from HuggingFace, making it flexible and customizable so users can choose the best model for their needs.
The next sub-task is to prepare a syntax specification for the task.
Also, the related issue is https://github.com/manticoresoftware/manticoresearch/issues/2074
As discussed in https://forum.manticoresearch.com/t/search-for-similar-documents/1799/2 https://forum.manticoresearch.com/t/search-for-similar-documents/1799 , it's quite complicated to generate embeddings outside of Manticore Search. It would be great if Manticore could do it automatically. It's worth checking if Manticore Search can be integrated with https://github.com/microsoft/onnxruntime/ or another similar library.
Checklist
To be completed by the assignee. Check off tasks that have been completed or are not applicable.