meilisearch / devrel

Anything Developer Relations at Meili
The Unlicense
26 stars 8 forks source link

Performance benchmark #334

Open ferdi05 opened 2 years ago

ferdi05 commented 2 years ago

We could create a performance benchmark, first for our different versions but also with our competitors. With various use cases

ferdi05 commented 2 years ago

2022-05-25 - Meeting with @Kerollmops and @ferdi05

Meilisearch v0.27 introduces a significant performance improvement. Unfortunately, this wasn't really advertised.

As many potential users refrain from using Meilisearch because of its lack of performance, there's an opportunity to advertise our improvements.

We saw this performance improvement in one experiment on an 80M documents index. This is a full database, and we ask to index a few batches of 10 documents, without using autobatching.

We could check what would be the performance of other use-cases like:

A few months ago, @ManyTheFish led some experiment on performance benchmark. There's also a benchmark from one of our most active Slack user.

Some thoughts:

Next step: we set a meeting with @Kerollmops @ferdi05 @curquiza @qdequele on 2022-05-31

ferdi05 commented 2 years ago

2022-05-31 - Meeting with @curquiza @qdequele @Kerollmops @ferdi05

Some use cases we thought of:

We should ask our PM to help us understand what the datasets of our users look like: number of documents, size (or range of size), update rate, what fields/documents are changing with the update

NEXT STEP: @ferdi05 to ask @gmourier & @davelarkan to help us to create use cases.

qdequele commented 2 years ago

@ferdi05 do you have already discuss a timeline for this?

gmourier commented 2 years ago

Thinking about it, we should select a first benchmark to integrate, that would allow us to go faster and have some measurements before waiting to have all benchmarks done at the same time. WDYT?

ferdi05 commented 2 years ago

Sorry @qdequele, I have written down a transcript of the meeting @davelarkan @gmourier and I had, but I still need to finish the work on the use cases. I put it on hold as I had over priorities to take care of lately, but I really hope to resume working on this during this week. We could indeed start with only one une case @gmourier

ferdi05 commented 2 years ago

2022-06-07 - Meeting with @gmourier @davelarkan

E-commerce (8Gb RAM) A. Big company eCommerce 300k documents - unsure about this (let's look at the eCommerce space) - size of the document depends of the item description if it's translated into different languages Facets/Filters/Sort/every feature but possibly less geosearch Updates ultra-fast on the stock. 100 updates every second, with usually one field update (# stocks, likes, views...) the dataset used in the eCommerce demo is quite accurate, it's just super large as it's an Amazon dataset

B. Smaller eCommerce less documents 1K-2K document updates every day the number of likes, no need for real-time updates a few updates per minutes

a lot of search

SaaS app (16 Gb RAM)

Slack/Crisp like 30M+ documents Multi-tenant group of 1M documents 2/3 filters date sorting Small documents (1Kb) UGC documents -> need for real-time indexing

not that many concurrent search

B2C app (16Gb RAM/ highest available configuration) Twitter-like - > they only index the last 24h tweets even more documents? this could help figure out what is the top-end limit of benchmark => pushing the limit

Site Search (1Gb RAM) 6k pages (your documentation) + 300 pages (your website) Multi index search sortby one one criterai on the last update date Large documents (10kb, larger quite often, think a wikipedia article) - different types of documents Wikipedia is a good use case => maybe a subset

App search: Multilingual (4Gb RAM)

Demo movies dataset (1.2M documents) one update per day, with everything that changed - possibly a field for each document like the movie rating All languages (60) nested fields lengthy documents: that's not true, even with a lot of languages filters/multi sort. documents are not UGC

a new document addition or an update of the documents: has to be stated everywhere

we'll have to find some datasets

ferdi05 commented 2 years ago

A mere 50 days after our last meeting, it seems like i have been able to translate this into a table, see below. A couple of days ago, I discussed it with @ManyTheFish Feel free to help me make this more accurate @gmourier @davelarkan

Use case Suggested instance specs Type of documents (number of fields per document) Average size of documents Number of documents Language Needed features (aka settings) Updates frequency Update nature (which fields are updated) + is this a new document or updating document How is the update performed (10 by ten) + batch size Search requests (number and frequency) Dataset idea
e-commerce - big company 8Gb RAM & 2 vCPU item description - 15 fields small: item desc. possibly translated into multiple languages 300k 10 languages Facets/Filters/Sort/geoSearch and possibly every feature 100 update/s updating a document, changing one field (number of stocks, views, likes...) unknown 1,000/s (possibly more but we may hit the limit already) Amazon Dataset used in the e-commerce demo
e-commerce - smaller company 2Gb RAM & 0.5/1 vCPU item description - 15 fields small: item desc. 1,000 English Facets/Filters/Sort/geoSearch and possibly every feature 5 update/min updating a document, changing one field (number of stocks, views, likes...) 10/s Amazon Dataset used in the e-commerce demo
SaaS app (like a company Slack) 16Gb RAM & 4 vCPU short messages - 10 fields 1kb 1M English 2 filters + date sorting + tenant tokens (200k documents per tenant) 10 update/min adding a new document real time 5/min
B2C app (like Twitter) 32Gb RAM & 8 vCPU (highest available) short messages - 10 fields 1kb 24M English date sorting 12 update/min adding a new document 2K new documents every 5 sec 10k/s
Site search 1Gb RAM & 0.5 vCPU (economy option) 1,000 pages (documentation and translation in 5 languages) + 200 pages (website and translation 5 languages) 10kb, different kind of documents (like Wikipedia) 1k 5 languages multi-index search + sortby one criterion (the last update date) very rare adding a new document, updating 1 per day 10/min a subset of Wikipedia
App search 4Gb RAM & 1 vCPU movie description, nested fields 800k 50 languages nested fields + filters/multi-sort once per day adding new documents or updating the field of another one 1 per day 50/min Where2Watch demo dataset (private link)
curquiza commented 2 years ago

Hello @ferdi05 thanks a lot for sharing this! 🙏 🔥 I see "missing CPU", is it planned to have this number one day?

gmourier commented 2 years ago

Hello everyone,

We started working on it in the core team. There are still some questions to solve on our side, notably the metrics we need to measure, the datasets to choose from, the flow of indexing and search operations, and the choices to make for the technical stack.

At first, we will focus only on the SaaS use case.

You can follow our discussion here.

ferdi05 commented 2 years ago

Hello @ferdi05 thanks a lot for sharing this! 🙏 🔥 I see "missing CPU", is it planned to have this number one day?

Hey @curquiza I'm not super aware of the details of our Cloud offers, but maybe @davelarkan can help. Is the Pricing page (private link) accurate for CPU?

davelarkan commented 2 years ago

Is the Pricing page (private link) accurate for CPU?

Hi @curquiza and @ferdi05 👋

This link is slightly outdated in that we have more plans now. ~I'll look into updating the content at that link~ (EDIT: Now updated).

In the meantime here are the 5 plans we offer, their price and CPU core count:

CleanShot 2022-08-08 at 10 38 47@2x
ferdi05 commented 2 years ago

thanks so much @davelarkan. @curquiza and @gmourier I updated the table accordingly

gmourier commented 2 years ago

Thanks @ferdi05 and @davelarkan; we might want to test a use case on different cloud machine specifications to better locate the limits and facilitate the choice of a plan on the customer success side in the future.