manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.68k stars 483 forks source link

Implement Manticore as a db backend for Jina DocArray #926

Open regstuff opened 1 year ago

regstuff commented 1 year ago

Is your feature request related to a problem? Please describe. Jina AI is one of the leading providers of neural search solutions. They have created DocArray, which helps users manage multi-modal data, where each item has sub-items of multiple types. For eg. a website would have text, image, video etc. This multi-type neural search capability is very helpful in using neural search in many real-world applications.

Describe the solution you'd like Manticore has many capabilities with text search. It would be great if it was possible to integrate these capabilities with DocArray's neural search engine. Jina has implemented support for db backends such as ANNLite, Qdrant & Elastic. They have also provided guides to integrate other backends. Manticore could leverage this, which would be very helpful for both Manticore & Jina AI users.

tomatolog commented 1 year ago

could you provide concrete examples of how to use features you reported?

regstuff commented 1 year ago

My use case for example is that we have a database of videos of various kinds. Say for eg. One video has the person riding a bike, while talking about various famous places in America. This video will have multiple sub-attributes: The video footage itself, the audio of what the speaker said, and the text transcript of that audio. There will be other metadata also such as date of the footage, location etc. We need to be able to search in various ways, for eg. search for the video where a person is riding a Honda bike, while talking about New York city. The video was shot between June and August 2019. Right now, I have to search for New York City in manticore, get all the footages that have that word, then go manually look up if the footage has a Honda in it. With the Jina AI integration, I will be able to search for New York City in the text, and also ask it to filter by Honda bike in the video. I would also be able to search for videos where the person refers to New York as "Big Apple" because of semantic similarity. We deal without thousands of videos a month, so manually tagging transcripts as say: this transcript is related to Honda is not feasible. Semantic search would be very helpful in this situation. A basic example of this is given here.

tomatolog commented 1 year ago

we already got knn proto during #839 not sure why we need to add specialized version of embedding