nixiesearch / nixiesearch

Hybrid search engine, combining best features of text and semantic search worlds
https://www.nixiesearch.ai/
Apache License 2.0
52 stars 3 forks source link
search search-engine semantic-search

Nixiesearch: neural search engine for the rest of us

CI Status License: Apache 2 Last commit Last release Join our slack

What is Nixiesearch?

Nixiesearch is a hybrid search engine that fine-tunes to your data.

Want to learn more? Go straight to the quickstart.

Why Nixiesearch?

Unlike some of the other vector search engines:

The project is in active development and not intended for production use just yet. Stay tuned and reach out if you want to try it!

Why NOT Nixiesearch?

Nixiesearch has the following design limitations:

Usage

Get the sample MS MARCO dataset:

curl -L -O http://nixiesearch.ai/data/msmarco.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   162  100   162    0     0   3636      0 --:--:-- --:--:-- --:--:--  3681
100 32085  100 32085    0     0   226k      0 --:--:-- --:--:-- --:--:--  226k

Run the Nixiesearch docker container:

docker run -i -t -p 8080:8080 nixiesearch/nixiesearch:latest standalone
12:40:47.325 INFO  ai.nixiesearch.main.Main$ - Staring Nixiesearch
12:40:47.460 INFO  ai.nixiesearch.config.Config$ - No config file given, using defaults
12:40:47.466 INFO  ai.nixiesearch.config.Config$ - Store: LocalStoreConfig(LocalStoreUrl(/))
12:40:47.557 INFO  ai.nixiesearch.index.IndexRegistry$ - Index registry initialized: 0 indices, config: LocalStoreConfig(LocalStoreUrl(/))
12:40:48.253 INFO  o.h.blaze.server.BlazeServerBuilder - 
███╗   ██╗██╗██╗  ██╗██╗███████╗███████╗███████╗ █████╗ ██████╗  ██████╗██╗  ██╗
████╗  ██║██║╚██╗██╔╝██║██╔════╝██╔════╝██╔════╝██╔══██╗██╔══██╗██╔════╝██║  ██║
██╔██╗ ██║██║ ╚███╔╝ ██║█████╗  ███████╗█████╗  ███████║██████╔╝██║     ███████║
██║╚██╗██║██║ ██╔██╗ ██║██╔══╝  ╚════██║██╔══╝  ██╔══██║██╔══██╗██║     ██╔══██║
██║ ╚████║██║██╔╝ ██╗██║███████╗███████║███████╗██║  ██║██║  ██║╚██████╗██║  ██║
╚═╝  ╚═══╝╚═╝╚═╝  ╚═╝╚═╝╚══════╝╚══════╝╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝  ╚═╝

12:40:48.267 INFO  o.h.blaze.server.BlazeServerBuilder - http4s v1.0.0-M38 on blaze v1.0.0-M38 started at http://0.0.0.0:8080/

Build an index for a hybrid search:

curl -XPUT -d @msmarco.json http://localhost:8080/msmarco/_index
{"result":"created","took":8256}

Send the search query:

curl -XPOST -d '{"query": {"match": {"text":"new york"}},"fields": ["text"]}'\
    http://localhost:8080/msmarco/_search
{
  "took": 13,
  "hits": [
    {
      "_id": "8035959",
      "text": "Climate & Weather Averages in New York, New York, USA.",
      "_score": 0.016666668
    },
    {
      "_id": "2384898",
      "text": "Consulate General of the Republic of Korea in New York.",
      "_score": 0.016393442
    },
    {
      "_id": "2241745",
      "text": "This is a list of the tallest buildings in New York City.",
      "_score": 0.016129032
    }
}

You can also open http://localhost:8080/_ui in your web browser for a basic web UI:

web ui

For more details, see a complete Quickstart guide.

Design

Nixiesearch is inspired by an Amazon search engine design described in a talk E-Commerce search at scale on Apache Lucene:

NS design diagram

Compared to traditional search engines like Elasticsearch/Solr:

Nixiesearch uses RRF for combining text and neural search results.

Limitations

Nixiesearch is not a general-purpose search engine like Elasticsearch:

Current status

At the moment, Nixiesearch is in the process of active development, so please reach out to use via the contact form if you want to try it!

License

This project is released under the Apache 2.0 license, as specified in the License file.