Closed aaqibjavith closed 2 years ago
Hello @aaqibjavith!
Thanks for your feedback and for trying MeiliSearch.
The current search engine in MeiliSearch is currently not able to handle this high number of documents. This is a known issue and the core-team is currently working hard on a new search engine that might be able to handle this quantity of documents! π
For the moment, we recommend pushing your documents in larger batches, don't forget to increase the Payload Limit Size.
Be sure we will keep you informed when the new search engine is available π
@curquiza While we wait, how many documents are recommended for the current engine?
It depends on many parameters like the size of your documents. The best is to try: add your documents in batches (you can increase the default payload size). But in any case, I would say less than 1M of records.
@curquiza Alright, with that considered we will wait until the new engine has been released. Do you have an estimate?
We hope we can release the v0.21.0 (containing the new version of the search engine) by the end of June. At least, we should be able to provide release-candidate version of the v0.21.0 by the end of June π
That's much earlier than I expected, I'll wait until then and try it out. I'm guessing it will be able to handle many millions of documents?
If I remember correctly the biggest dataset we are currently using to test the new engine is ~120M documents.
I'm guessing it will be able to handle many millions of documents?
Yes π
That's amazing, thanks for the feedback!
Same here :
In injected 220k documents in 2k seconds. That is around 100_doc/sec which is cool.
But indexation is currently running at 1.5_doc/sec, hogging 70% of a core of my machine.
This mean I'll get my documents after two and a half days.
Hello @setop! Thanks for the feedback!
Even if the current MeiliSearch has performance troubles, the indexing time you get for your dataset is not really expected. I personally have just indexed 160k documents in less than 5min in my local machine.
The huge amount of time might be reduced by the way you pushed your documents. Do you push your documents in batch or one by one? Pushing your documents in batch increases drastically the performance. I pushed my documents in batches of 1800 documents, which means I was able to index my documents with less than 100 enqueued updates into MeiliSearch. You can increase the default payload size of MeiliSearch: https://docs.meilisearch.com/reference/features/configuration.html#payload-limit-size
However, my documents are small (6 small attributes per document). Maybe yours are really big, which can explain this huge time.
I indeed enqueue them one by one. It is much more convenient to me first because ... it is more convenient :) and also because error can be raised for one document (during data preparation or when ingesting) and I don't want this error to propagate to siblings.
Honestly, it is not obvious to me, without digging into implementation, why batch improve things. To me, indexing should cost much much more than iterating over input. But that is true that the API suggest that is should be done by batch since it only accept a list of documents and not a single document.
I currently manage to ingest 19k documents (even if ingestion is going slower, less than 0.5_doc/sec). And made some test on search, and this part is terrific ! Blazing fast, accurate, typo tolerant. Love it.
I'll have a try at modifying my ingestion pipeline to work in batch.
I retried with 1k documents batches. It ingest 25k doc in few seconds :rocket: 259MB raw data, 1.8GB index size.
Awesome @setop, I'm going to open an issue in the docs, pushing by batch is not obvious for everyone. To be honest that's a tip I give multiple times a week, you're not the only one π Glad you succeeded to index your data! Edit: I added a comment to this already existing issue: https://github.com/meilisearch/documentation/issues/875#issuecomment-831115956
Hello @setop, @tmikaeld, @aaqibjavith and everyone following the issue!
The first RC of MeiliSearch v0.21.0 is out. We did our best to fix indexation and crash issues. We succeeded to improve it, but not to totally fix them.
You can test this new release by downloading the binaries available in this release. Or you can use it with docker:
docker run -p 7700:7700 getmeili/meilisearch:v0.21.0rc1 ./meilisearch
We will still improve this after the release of the v0.21.0. We would rather release a non-completely optimized version rather than delay it and, at the same time, delay the release of new features. Be sure we are doing our best to always improve these indexation issues.
As a reminder:
Thanks for your patience and your help with this! β€οΈ
Hello @setop, @tmikaeld, @aaqibjavith and everyone following the issue!
The first RC of MeiliSearch v0.21.0 is out. We did our best to fix indexation and crash issues. We succeeded to improve it, but not to totally fix them.
You can test this new release by downloading the binaries available in this release. Or you can use it with docker:
docker run -p 7700:7700 getmeili/meilisearch:v0.21.0rc1 ./meilisearch
We will still improve this after the release of the v0.21.0. We would rather release a non-completely optimized version rather than delay it and, at the same time, delay the release of new features. Be sure we are doing our best to always improve these indexation issues.
As a reminder:
- despite the improvements, we always recommend pushing your documents to MeiliSearch by batch and not one by one. The maximum payload size of MeiliSearch is 100Mb and can be optionally increased. It means most of the dataset can be pushed with one call to the API.
- If you still have a memory crash, it means the RAM of your machine might not be adapted to your dataset size. We recommend increasing the RAM of your machine.
Thanks for your patience and your help with this! β€οΈ
I using v0.21.0 and trying to import 28M records, import is very fast but currently index still in progress and only 85K records was indexed. At my case, my server has 8GB of RAM and 4 Cores of CPU.
Do you have any recommend for me about max payload size? I've use default value.
Thanks & Best regards,
Hello @quangtam! Thank you for your feedback π
Hi @curquiza
I've installed v0.22.0rc1 and testing, I can see it's faster than v0.21.0 and currently after 167mins (2h47m) it's indexed for 342K records (~2000 records/min) and slowing down. Around first 100k, speed around 9000 records/min. My hardware:
Also, I've faced one time crash (was killed by system) during running indexing but RAM still free 6/8GB of RAM
Thank you for your quick report π
It seems really slow for the number of documents, but maybe each document is big. What is the total size of your dataset?
Also, I've faced one time crash (was killed by system) during running indexing but RAM still free 6/8GB of RAM
Weird... We had a similar issue (killed by the OS but the RAM was not fully used) because of a lack of space on the disk. Could it be your situation?
Hi @curquiza,
I have ~28M records in MySQL (5.9GB in DB). I've read your documents about virtual memory but when meilisearch crash I haven't check hard disk. But currently, I have 270GB free on the disk (total is 300GB), should I increase?
I have ~28M records in MySQL (5.9GB in DB)
Ok, I thought your dataset was 300K of documents, not 28M.
But currently, I have 270GB free on the disk (total is 300GB), should I increase?
Hum, since I don't have your dataset, it's not easy to guess. Maybe increase it could solve, but I'm not sure at all.
One of our dev is currently investigating on a similar issue right now.
Hi @curquiza,
Here is my table information
After 4 days, the MeiliSearch still in-process to indexing 2.8M rows of record (version 0.22.rc01):
=> new version is faster but stills too slow.
As replied in #1098, I'm having similar issues with 0.23:
I'm having the same issue, using the following setup:
- Laravel Scout
- MeiliSearch 0.23
- Using Laravel Forge, MeiliSearch server, AWS m5.xlarge (4cpu, 16GB)
- ~50 indexes (1 index per tenant/client)
- ~1 Million documents in total (with 30 attributes, either integers or small strings (<250 chars))
- Initial import took around 3-4 hour to complete (in batches of 500 documents)
- Additional updates took around 15 seconds to complete (checking the status on the API)
This caused the backlog to never catch-up. 1 of 4 CPU's was at 100% the entire time. (I understand that 50% is reserved for search queries, but expected 2 CPU's to be used for indexing).
On #1256 it stated that above 1M documents wasn't supported pre-0.21, but should be now? Also the index time should have improved, but according to my tests and a random person on the internet (https://twitter.com/msnegurski/status/1460705253033185285) is hasn't. Are these numbers out of the scope for MeiliSearch? Is the indexation time expected, or have I misconfigured something? (I haven't actually configured anything except spin up the Forge Meili server). I understand batching is preferred, but that's not currently how Scout works. Also, a delay of 1 minute is really stretching the usability (eg. creating orders should result in the orders being visible in the orders index almost directly).
I tested the same setup (same documents/indexes/attributes, also using 1 document updates, using Laravel Scout) with AWS OpenSearch (using the ElasticSearch Scout driver), on a similar setup (r6g.large.search, 2CPU/16GB, same price as m5.xlarge) and the initial batch import was 15 minutes, and separate updates where indexed instantly (<1 sec) without any delays, on the same load as Meili. This make me wonder if: A. This workload is unrealistic for MeiliSearch B. I'm doing something wrong C. This is a bug?
I tested the same setup (same documents/indexes/attributes, also using 1 document updates, using Laravel Scout) with AWS OpenSearch (using the ElasticSearch Scout driver), on a similar setup (r6g.large.search, 2CPU/16GB, same price as m5.xlarge) and the initial batch import was 15 minutes, and separate updates where indexed instantly (<1 sec) without any delays, on the same load as Meili. This make me wonder if: A. This workload is unrealistic for MeiliSearch B. I'm doing something wrong C. This is a bug?
We have here the same problem. Using Meilisearch with Laravel Scout and a huge amount of data didn't work. @barryvdh which ElasticSearch Scout driver did you use?
Hello everyone reading this issue.
For clarity and simplicity, we decided to gather all the indexation issues and feedback in one place -> this discussion
Please let any feedback in this discussion, this would will be really appreciated π
I am trying to migrate 40M Postgres records (user comments) to Melisearch. Each time, I am querying 10K records from Postgres and writing to Meilisearch. The migration script has been completed. But it has been more than 3 days, only 4.2 Million records only migrated. The count is increasing very slowly.
I also see the following error messages in the error log:
Also, I tried to get updates using the following curl
It is running forever, not getting any result from the server.