Open ferdi05 opened 2 years ago
Meilisearch v0.27 introduces a significant performance improvement. Unfortunately, this wasn't really advertised.
As many potential users refrain from using Meilisearch because of its lack of performance, there's an opportunity to advertise our improvements.
We saw this performance improvement in one experiment on an 80M documents index. This is a full database, and we ask to index a few batches of 10 documents, without using autobatching.
We could check what would be the performance of other use-cases like:
A few months ago, @ManyTheFish led some experiment on performance benchmark. There's also a benchmark from one of our most active Slack user.
Some thoughts:
Next step: we set a meeting with @Kerollmops @ferdi05 @curquiza @qdequele on 2022-05-31
Users' pain points: inserting new documents, because indexing seems to be ok in most of the cases
We could run new benchmarks with each new commit
We need to have a better understanding of our use case
Some use cases we thought of:
E-commerce (8Gb RAM)
SaaS app (16 Gb RAM)
B2C app (16Gb RAM)
Site Search (1Gb RAM)
App search: Multilingual (4Gb RAM)
We should ask our PM to help us understand what the datasets of our users look like: number of documents, size (or range of size), update rate, what fields/documents are changing with the update
NEXT STEP: @ferdi05 to ask @gmourier & @davelarkan to help us to create use cases.
@ferdi05 do you have already discuss a timeline for this?
Thinking about it, we should select a first benchmark to integrate, that would allow us to go faster and have some measurements before waiting to have all benchmarks done at the same time. WDYT?
Sorry @qdequele, I have written down a transcript of the meeting @davelarkan @gmourier and I had, but I still need to finish the work on the use cases. I put it on hold as I had over priorities to take care of lately, but I really hope to resume working on this during this week. We could indeed start with only one une case @gmourier
E-commerce (8Gb RAM) A. Big company eCommerce 300k documents - unsure about this (let's look at the eCommerce space) - size of the document depends of the item description if it's translated into different languages Facets/Filters/Sort/every feature but possibly less geosearch Updates ultra-fast on the stock. 100 updates every second, with usually one field update (# stocks, likes, views...) the dataset used in the eCommerce demo is quite accurate, it's just super large as it's an Amazon dataset
B. Smaller eCommerce less documents 1K-2K document updates every day the number of likes, no need for real-time updates a few updates per minutes
a lot of search
SaaS app (16 Gb RAM)
Slack/Crisp like 30M+ documents Multi-tenant group of 1M documents 2/3 filters date sorting Small documents (1Kb) UGC documents -> need for real-time indexing
not that many concurrent search
B2C app (16Gb RAM/ highest available configuration) Twitter-like - > they only index the last 24h tweets even more documents? this could help figure out what is the top-end limit of benchmark => pushing the limit
Site Search (1Gb RAM) 6k pages (your documentation) + 300 pages (your website) Multi index search sortby one one criterai on the last update date Large documents (10kb, larger quite often, think a wikipedia article) - different types of documents Wikipedia is a good use case => maybe a subset
App search: Multilingual (4Gb RAM)
Demo movies dataset (1.2M documents) one update per day, with everything that changed - possibly a field for each document like the movie rating All languages (60) nested fields lengthy documents: that's not true, even with a lot of languages filters/multi sort. documents are not UGC
a new document addition or an update of the documents: has to be stated everywhere
we'll have to find some datasets
A mere 50 days after our last meeting, it seems like i have been able to translate this into a table, see below. A couple of days ago, I discussed it with @ManyTheFish Feel free to help me make this more accurate @gmourier @davelarkan
Use case | Suggested instance specs | Type of documents (number of fields per document) | Average size of documents | Number of documents | Language | Needed features (aka settings) | Updates frequency | Update nature (which fields are updated) + is this a new document or updating document | How is the update performed (10 by ten) + batch size | Search requests (number and frequency) | Dataset idea |
---|---|---|---|---|---|---|---|---|---|---|---|
e-commerce - big company | 8Gb RAM & 2 vCPU | item description - 15 fields | small: item desc. possibly translated into multiple languages | 300k | 10 languages | Facets/Filters/Sort/geoSearch and possibly every feature | 100 update/s | updating a document, changing one field (number of stocks, views, likes...) | unknown | 1,000/s (possibly more but we may hit the limit already) | Amazon Dataset used in the e-commerce demo |
e-commerce - smaller company | 2Gb RAM & 0.5/1 vCPU | item description - 15 fields | small: item desc. | 1,000 | English | Facets/Filters/Sort/geoSearch and possibly every feature | 5 update/min | updating a document, changing one field (number of stocks, views, likes...) | 10/s | Amazon Dataset used in the e-commerce demo | |
SaaS app (like a company Slack) | 16Gb RAM & 4 vCPU | short messages - 10 fields | 1kb | 1M | English | 2 filters + date sorting + tenant tokens (200k documents per tenant) | 10 update/min | adding a new document | real time | 5/min | |
B2C app (like Twitter) | 32Gb RAM & 8 vCPU (highest available) | short messages - 10 fields | 1kb | 24M | English | date sorting | 12 update/min | adding a new document | 2K new documents every 5 sec | 10k/s | |
Site search | 1Gb RAM & 0.5 vCPU (economy option) | 1,000 pages (documentation and translation in 5 languages) + 200 pages (website and translation 5 languages) | 10kb, different kind of documents (like Wikipedia) | 1k | 5 languages | multi-index search + sortby one criterion (the last update date) | very rare | adding a new document, updating | 1 per day | 10/min | a subset of Wikipedia |
App search | 4Gb RAM & 1 vCPU | movie description, nested fields | 800k | 50 languages | nested fields + filters/multi-sort | once per day | adding new documents or updating the field of another one | 1 per day | 50/min | Where2Watch demo dataset (private link) |
Hello @ferdi05 thanks a lot for sharing this! 🙏 🔥 I see "missing CPU", is it planned to have this number one day?
Hello everyone,
We started working on it in the core team. There are still some questions to solve on our side, notably the metrics we need to measure, the datasets to choose from, the flow of indexing and search operations, and the choices to make for the technical stack.
At first, we will focus only on the SaaS use case.
You can follow our discussion here.
Hello @ferdi05 thanks a lot for sharing this! 🙏 🔥 I see "missing CPU", is it planned to have this number one day?
Hey @curquiza I'm not super aware of the details of our Cloud offers, but maybe @davelarkan can help. Is the Pricing page (private link) accurate for CPU?
Is the Pricing page (private link) accurate for CPU?
Hi @curquiza and @ferdi05 👋
This link is slightly outdated in that we have more plans now. ~I'll look into updating the content at that link~ (EDIT: Now updated).
In the meantime here are the 5 plans we offer, their price and CPU core count:
thanks so much @davelarkan. @curquiza and @gmourier I updated the table accordingly
Thanks @ferdi05 and @davelarkan; we might want to test a use case on different cloud machine specifications to better locate the limits and facilitate the choice of a plan on the customer success side in the future.
We could create a performance benchmark, first for our different versions but also with our competitors. With various use cases