valeriansaliou / sonic

๐Ÿฆ” Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
https://crates.io/crates/sonic-server
Mozilla Public License 2.0
19.66k stars 562 forks source link
backend database graph index infrastructure rust search search-engine search-server server

Sonic

Test and Build Build and Release dependency status Buy Me A Coffee

Sonic is a fast, lightweight and schema-less search backend. It ingests search texts and identifier tuples that can then be queried against in a microsecond's time.

Sonic can be used as a simple alternative to super-heavy and full-featured search backends such as Elasticsearch in some use-cases. It is capable of normalizing natural language search queries, auto-completing a search query and providing the most relevant results for a query. Sonic is an identifier index, rather than a document index; when queried, it returns IDs that can then be used to refer to the matched documents in an external database.

A strong attention to performance and code cleanliness has been given when designing Sonic. It aims at being crash-free, super-fast and puts minimum strain on server resources (our measurements have shown that Sonic - when under load - responds to search queries in the ฮผs range, eats ~30MB RAM and has a low CPU footprint; see our benchmarks).

Tested at Rust version: rustc 1.74.1 (a28077b28 2023-12-04)

๐Ÿ‡ซ๐Ÿ‡ท Crafted in Nantes, France.

:newspaper: The Sonic project was initially announced in a post on my personal journal.

Sonic

ยซ Sonic ยป is the mascot of the Sonic project. I drew it to look like a psychedelic hipster hedgehog.

Who uses it?

Crisp Scrumpy

๐Ÿ‘‹ You use Sonic and you want to be listed there? Contact me.

Demo

Sonic is integrated in all Crisp search products on the Crisp platform. It is used to index half a billion objects on a $5/mth 1-vCPU SSD cloud server (as of 2019). Crisp users use it to search in their messages, conversations, contacts, helpdesk articles and more.

You can test Sonic live on: Crisp Helpdesk, and get an idea of the speed and relevance of Sonic search results. You can also test search suggestions from there: start typing at least 2 characters for a word, and get suggested a full word (press the tab key to expand suggestion). Both search and suggestions are powered by Sonic.

Demo on Crisp Helpdesk search

Sonic fuzzy search in helpdesk articles at its best. Lookup for any word or group of terms, get results instantly.

Features

How to use it?

Installation

Sonic is built in Rust. To install it, either download a version from the Sonic releases page, use cargo install or pull the source code from master.

๐Ÿ‘‰ Each release binary comes with an .asc signature file, which can be verified using @valeriansaliou GPG public key: :key:valeriansaliou.gpg.pub.asc.

๐Ÿ‘‰ Install from packages:

Sonic provides pre-built packages for Debian-based systems (Debian, Ubuntu, etc.).

Important: Sonic only provides 64 bits packages targeting Debian 12 for now (codename: bookworm). You might still be able to use them on other Debian versions, as well as Ubuntu (although they rely on a specific glibc version that might not be available on older or newer systems).

First, add the Sonic APT repository (eg. for Debian bookworm):

echo "deb [signed-by=/usr/share/keyrings/valeriansaliou_sonic.gpg] https://packagecloud.io/valeriansaliou/sonic/debian/ bookworm main" > /etc/apt/sources.list.d/valeriansaliou_sonic.list
curl -fsSL https://packagecloud.io/valeriansaliou/sonic/gpgkey | gpg --dearmor -o /usr/share/keyrings/valeriansaliou_sonic.gpg
apt-get update

Then, install the Sonic package:

apt-get install sonic

Then, edit the pre-filled Sonic configuration file:

nano /etc/sonic.cfg

Finally, restart Sonic:

service sonic restart

๐Ÿ‘‰ Install from source:

If you pulled the source code from Git, you can build it using cargo:

cargo build --release

You can find the built binaries in the ./target/release directory.

Install build-essential, clang, libclang-dev, libc6-dev, g++ and llvm-dev to be able to compile the required RocksDB dependency.

Note that the following optional features can be enabled upon building Sonic: allocator-jemalloc, tokenizer-chinese and tokenizer-japanese (some might be already enabled by default).

๐Ÿ‘‰ Install from Cargo:

You can install Sonic directly with cargo install:

cargo install sonic-server

Ensure that your $PATH is properly configured to source the Crates binaries, and then run Sonic using the sonic command.

Install build-essential, clang, libclang-dev, libc6-dev, g++ and llvm-dev to be able to compile the required RocksDB dependency.

๐Ÿ‘‰ Install from Docker Hub:

You might find it convenient to run Sonic via Docker. You can find the pre-built Sonic image on Docker Hub as valeriansaliou/sonic.

First, pull the valeriansaliou/sonic image:

docker pull valeriansaliou/sonic:v1.4.9

Then, seed it a configuration file and run it (replace /path/to/your/sonic/config.cfg with the path to your configuration file):

docker run -p 1491:1491 -v /path/to/your/sonic/config.cfg:/etc/sonic.cfg -v /path/to/your/sonic/store/:/var/lib/sonic/store/ valeriansaliou/sonic:v1.4.9

In the configuration file, ensure that:

Sonic will be reachable from tcp://localhost:1491.

๐Ÿ‘‰ Install from another source (non-official):

Other installation sources are available:

Note that those sources are non-official, meaning that they are not owned nor maintained by the Sonic project owners. The latest Sonic version available on those sources might be outdated, in comparison to the latest version available through the Sonic project.

Configuration

Use the sample config.cfg configuration file and adjust it to your own environment.

If you are looking to fine-tune your configuration, you may read our detailed configuration documentation.

Run Sonic

Sonic can be run as such:

./sonic -c /path/to/config.cfg

Perform searches and manage objects

Both searches and object management (i.e. data ingestion) is handled via the Sonic Channel protocol only. As we want to keep things simple with Sonic (similarly to how Redis does it), Sonic does not offer a HTTP endpoint or similar; connecting via Sonic Channel is the way to go when you need to interact with the Sonic search database.

Sonic distributes official libraries, that let you integrate Sonic to your apps easily. Click on a library below to see library integration documentation and code.

If you are looking for details on the raw Sonic Channel TCP-based protocol, you can read our detailed protocol documentation. It can prove handy if you are looking to code your own Sonic Channel library.

๐Ÿ“ฆ Sonic Channel Libraries

1๏ธโƒฃ Official Libraries

Sonic distributes official Sonic integration libraries for your programming language (official means that those libraries have been reviewed and validated by a core maintainer):

2๏ธโƒฃ Community Libraries

You can find below a list of Sonic integrations provided by the community (many thanks to them!):

โ„น๏ธ Cannot find the library for your programming language? Build your own and be referenced here! (contact me)

Which text languages are supported?

Sonic supports a wide range of languages in its lexing system. If a language is not in this list, you will still be able to push this language to the search index, but stop-words will not be eluded, which could lead to lower-quality search results.

The languages supported by the lexing system are:

How fast & lightweight is it?

Sonic was built for Crisp from the start. As Crisp was growing and indexing more and more search data into a full-text search SQL database, we decided it was time to switch to a proper search backend system. When reviewing Elasticsearch (ELS) and others, we found those were full-featured heavyweight systems that did not scale well with Crisp's freemium-based cost structure.

At the end, we decided to build our own search backend, designed to be simple and lightweight on resources.

You can run function-level benchmarks with the command: cargo bench --features benchmark

๐Ÿ‘ฉโ€๐Ÿ”ฌ Benchmark #1

โžก๏ธ Scenario

We performed an extract of all messages from the Crisp team used for Crisp own customer support.

We want to import all those messages into a clean Sonic instance, and then perform searches on the index we built. We will measure the time that Sonic spent executing each operation (ie. each PUSH and QUERY commands over Sonic Channel), and group results per 1,000 operations (this outputs a mean time per 1,000 operations).

โžก๏ธ Context

Our benchmark is ran on the following computer:

Sonic is compiled as following:

Our dataset is as such:

โžก๏ธ Scripts

The scripts we used to perform the benchmark are:

  1. PUSH script: sonic-benchmark_batch-push.js
  2. QUERY script: sonic-benchmark_batch-query.js

โฌ Results

Our findings:

Compared results per operation (on a single object):

We took a sample of 8 results from our batched operations, which produced a total of 1,000 results (1,000,000 items, with 1,000 items batched per measurement report).

This is not very scientific, but it should give you a clear idea of Sonic performances.

Time spent per operation:

Operation Average Best Worst
PUSH 275ฮผs 190ฮผs 363ฮผs
QUERY 880ฮผs 852ฮผs 1ms

Batch PUSH results as seen from our terminal (from initial index of: 0 objects):

Batch PUSH benchmark

Batch QUERY results as seen from our terminal (on index of: 1,000,000 objects):

Batch QUERY benchmark

Limitations

:fire: Report A Vulnerability

If you find a vulnerability in Sonic, you are more than welcome to report it directly to @valeriansaliou by sending an encrypted email to valerian@valeriansaliou.name. Do not report vulnerabilities in public GitHub issues, as they may be exploited by malicious people to target production servers running an unpatched Sonic instance.

:warning: You must encrypt your email using @valeriansaliou GPG public key: :key:valeriansaliou.gpg.pub.asc.