vintasoftware / django-zombodb

Easy Django integration with Elasticsearch through ZomboDB Postgres Extension
https://django-zombodb.readthedocs.io
MIT License
149 stars 10 forks source link

Scalability of filtering operations #30

Open thclark opened 4 years ago

thclark commented 4 years ago

Description

Say we filter the queryset as shown in the docs

Restaurant.objects.filter(
    name__startswith='Pizza'
).query_string_search(
    'name:Hut'
)

Is that a scalable solution? I suppose I'm asking whether:

I'm a bit more used to django-haystack, where in order to achieve this kind of filtering scalably you'd have to have a thing you want to filter against in the search index itself. Excited by the potential of zombodb but needed to check this!

Suggestion

Please could the documentation here include a slightly more in depth note about how those filters are achieved and if it's scalable?

fjsj commented 4 years ago

Hi @thclark, thanks for the issue.

zombodb in effect evaluates that filtered queryset, then sends a list of ids to ElasticSearch in order to filter the potential results

In fact it's the opposite. The 'name:Hut' search is executed on ElasticSearch side. Then the results are filtered with the additional SQL filters (WHERE). Note results can be limited to avoid heavy ES searches.

Haystack and basically every other search tool I've checked will suffer from similar problems: you need a list of ids to combine SQL filtering with searching (on a separate Search Engine).

But be aware nothing prevents you to ensure you're using only ElasticSearch for searches and completely avoid the use of SQL WHERE / .filter. In fact, that's recommended per docs:

It’s fine to call filter/exclude/etc. before and after search. If possible, the best would be using only a Elasticsearch query.

For that, you just need to use filter everything with the ES syntax. Try the dsl_search method.

However, I agree that's not clear enough, so I think we should separate that into a new warning explaning better what's going on behind the scenes. I'll leave this issue open due to that.

I'ld personally suggest you trying django-zombodb if you already have the infrastructure to support it. Be aware of zombodb's limitations though: https://github.com/zombodb/zombodb/blob/master/THINGS-TO-KNOW.md