meilisearch / meilisearch

A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
https://www.meilisearch.com
MIT License
47.25k stars 1.84k forks source link

Meilisearch crashes when using a lot of filter clauses #2163

Closed martijnderidder closed 2 years ago

martijnderidder commented 2 years ago

Describe the bug When using Meilisearch in production i want to exclude a lot of id's. But there is a limit when adding a lot of filters clauses. Is there maybe another way of fixing this? I have a large set of invoices in one index. And i want to select only the documents thats have a relation_id from the user. But in our production website, some users already have 3000 more relations.

To Reproduce When using a basic Laravel (PHP) envoirement i'm typing something like this. (Its a crazy example, but lets say i only want to exclude all id's except id 73)...

Route::get('/', function () {
    $ids = range(1, 2400);
    unset($ids[73]);

    $user = User::search('', function ($meilisearch, $query, $options) use ($ids) {
        $options['filter'] = '';
        foreach ($ids as $id) {
            $options['filter'] .= 'id != "'. $id .'" AND ';
        }

        $options['filter'] = substr($options['filter'], 0, -4);
        $options['sort'] = ['name:asc'];
        return $meilisearch->search($query, $options);
    })->paginate(20);

    dd($user);
});

I get an error;

cURL error 52: Empty reply from server (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://meilisearch:7700/indexes/users/search

Expected behavior I would expect that i can use a "unlimited" amount of filter clauses.

Meilisearch version: v0.25.2

irevoire commented 2 years ago

Ok, so there are two things. First, you’re not supposed to get an empty reply; that’s supposed to be fixed by https://github.com/meilisearch/milli/pull/421 in the v0.25.xx.

Second, since you’re supposed to be getting an error, we can consider this:

I would expect that I can use an "unlimited" amount of filter clauses.

As a feature request, you are currently limited to a certain amount of terms in your filters.

One fix we thought would be to create an IN operation that would allow you to write things like:

NOT id IN [0, 1, 2, 3, 4, ... ]

That would make your request appear as one operation and thus work.

You can read more about it here.


Now on another note, I’m wondering, isn't there a better way to do it? Because even with this operator, you would be sending something like 12.000 bytes per request (12ko). That’s really a lot and doesn’t seem like a good long term solution to me 🤔

martijnderidder commented 2 years ago

Ok i guess if there is no IN option, or the filters are limited to a max length, we can not use this search engine for our company in production. I really love Meilisearch, but it doesn't look quite finished yet (it isn't ofc, there is no v1.0). Keep me posted when you guys are ready with an IN function that can have a couple thousand of id's.

@irevoire

Now on another note, I’m wondering, isn't there a better way to do it? Because even with this operator, you would be sending something like 12.000 bytes per request (12ko). That’s really a lot and doesn’t seem like a good long term solution to me thinking

I don't know how; we use have a large database with milions of records; just look at this example:

Users table:

So John can see 3 invoices. But i dont want to include john user id in the invoices table. The whole system is based on the idea of something is owned by a relation (beteen 2 users). So i need a query like: `SELECT id, name FROM invoices WHERE relation = 1 OR relation = 2) ... but some users have thousands of relations.

We are now using Elasticsearch, and that is working. But its very complicated, and not supported by Laravel 9 (with the current package we are using).

irevoire commented 2 years ago

In reality the filter are limited to a certain depth. And each time you write a OR or AND the depth increase by one. So once we implement the IN filter you'll have a solution equivalent to how you do it in elastic search I guess 👍

martijnderidder commented 2 years ago

@irevoire In the time-beeing i have found a solution to our problem. Its is so easy that it is almost stupid i did not think about it haha. But i can ofcourse always add extra fields during the indexing of documents. I can just add a entrepreneur_id and employee_id in the index.

This problem can only be solved if a fix amount of users (in our case 2 users: the person who owns a relation, and who is connected) can be add in our index. If a document can be viewed by a dynamic group of lets say 50 users, its no good option to add extra user_id in the indexes.

gmourier commented 2 years ago

Hi @martijnderidder, indeed Meilisearch is not based on a relationship model.

It forces to denormalize the SQL structure in some way. The most direct solution I see would be to deport the many-to-many relationship on each invoice document by integrating an array of user ids that would have access to it.

e.g invoice 1 JSON document "user_ids": [1, 2]

this way, if user 1 is logged in and wants to be able to find the invoices he can access, a simple filter clause user_ids = 1 should suffice.

Does your many-to-many relationship only accept two users per invoice?

MarinPostma commented 2 years ago

@gmourier this is not a good idea, adding a new user requires re-indexing the database, and this is very inefficient storagewise

martijnderidder commented 2 years ago

@gmourier this is not a good idea, adding a new user requires re-indexing the database, and this is very inefficient storagewise

This is true. Thats why i already was warning that this solution can only be used with a fix amount of users that can be viewed. So when you add the document, you already know wich users this document can be viewed.

francoism90 commented 2 years ago

I'm also interested in an IN filter as I'm now forced to limit them (e.g. pick 500 IDs).

For now I'm using a hybrid solution; SQL for making lists and Meili when doing filtering with limited results.

curquiza commented 2 years ago

Hello everyone here.

I see two concerns

I close this issue since the bug will be fixed, and the ticket is open on the product side :)