redis / redis-om-python

Object mapping, and more, for Redis and Python
MIT License
1.12k stars 112 forks source link

Finding first match not working as expected? #218

Closed simonprickett closed 2 years ago

simonprickett commented 2 years ago

Reported by @msarm initially as part of #207 :

I see 'ft.search' round trip got reduced now after defaulting the page size to 1000 and this helps other queries to run even faster but my results are not coming this time may be a different issue now.

When I perform the Person query:

Person.find().sort_by('-emp_no').first()

Here is the ft.search performed.

args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 0, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 1000, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 2000, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 3000, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 4000, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 5000, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 6000, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 7000, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 8000, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 9000, 1, 'SORTBY', 'emp_no', 'desc']
args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 10000, 1, 'SORTBY', 'emp_no', 'desc']  - Fired query hangs

When the last fired ft.search is performed it does not return any results it retries for every.

Since are looking just for the first record, can we just fire the first ft.search and return the results?

args: ['ft.search', ':__main__.Person:index', '*', 'LIMIT', 0, 1, 'SORTBY', 'emp_no', 'desc']

When I set explicitly the exhaust_results=False in the first() function, I see quick results with no extra query. Is that right the fix?

    def first(self):
        query = self.copy(offset=0, limit=1, sort_fields=self.sort_fields)
        results = query.execute(exhaust_results=False)
        if not results:
            raise NotFoundError()
        return results[0]

Original code:

import datetime
from typing import Optional

from redis_om import Field, HashModel, Migrator, get_redis_connection

# This Redis instance is tuned for durability.
REDIS_DATA_URL = "redis://localhost:6380"

class Person(HashModel):
    first_name: str = Field(index=True)
    last_name: str = Field(index=True)
    emp_no: int =  Field(index=True)

# set redis connection
Person.Meta.database = get_redis_connection(url=REDIS_DATA_URL,
                                                  decode_responses=True)
# apply migrations
Migrator().run()

for row_number in range(0,10000):
    person = Person(first_name="John" + str(row_number), last_name="Doe", emp_no=row_number)

    result = Person.find(Person.emp_no ==row_number).all()
    if (len(result) == 0):
        person.save()

    print(person.pk)

# very slow to query a single record (~12 seconds)
Person.find().sort_by('-emp_no').first()
simonprickett commented 2 years ago

Notes to self: When I run the above code (without setting exhaust_results) I see the following FT.SEARCH commands run:

1650879110.754734 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "0" "1" "SORTBY" "emp_no" "desc"
1650879110.763795 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "1000" "1" "SORTBY" "emp_no" "desc"
1650879110.772272 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "2000" "1" "SORTBY" "emp_no" "desc"
1650879110.782112 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "3000" "1" "SORTBY" "emp_no" "desc"
1650879110.790033 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "4000" "1" "SORTBY" "emp_no" "desc"
1650879110.798478 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "5000" "1" "SORTBY" "emp_no" "desc"
1650879110.807172 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "6000" "1" "SORTBY" "emp_no" "desc"
1650879110.815886 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "7000" "1" "SORTBY" "emp_no" "desc"
1650879110.825165 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "8000" "1" "SORTBY" "emp_no" "desc"
1650879110.834179 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "9000" "1" "SORTBY" "emp_no" "desc"
1650879110.842548 [0 172.17.0.1:59320] "ft.search" ":__main__.Person:index" "*" "LIMIT" "10000" "1" "SORTBY" "emp_no" "desc"

and get this response:

pk='01G1G0YXE8FT6N5M7TZQS2RGJ1' first_name='John9999' last_name='Doe' emp_no=9999

(Redis Stack Docker container, Redis OM 0.0.25). I'll look at how first works next.

simonprickett commented 2 years ago

Behaviour I see is that when using first, the correct limit query is run, then the whole dataset is paginated through too:

1650881834.736011 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "0" "1" "SORTBY" "emp_no" "desc"
1650881834.745572 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "1000" "1" "SORTBY" "emp_no" "desc"
1650881834.755145 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "2000" "1" "SORTBY" "emp_no" "desc"
1650881834.764395 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "3000" "1" "SORTBY" "emp_no" "desc"
1650881834.773193 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "4000" "1" "SORTBY" "emp_no" "desc"
1650881834.781712 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "5000" "1" "SORTBY" "emp_no" "desc"
1650881834.790438 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "6000" "1" "SORTBY" "emp_no" "desc"
1650881834.799501 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "7000" "1" "SORTBY" "emp_no" "desc"
1650881834.809064 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "8000" "1" "SORTBY" "emp_no" "desc"
1650881834.818580 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "9000" "1" "SORTBY" "emp_no" "desc"
1650881834.827354 [0 172.17.0.1:59328] "ft.search" ":__main__.Person:index" "*" "LIMIT" "10000" "1" "SORTBY" "emp_no" "desc"
simonprickett commented 2 years ago

@msarm I think you're right, and that first does need to set exhaust_results=False... when I do this and try again just get the one query that you'd expect:

1650886093.706949 [0 172.17.0.1:59332] "ft.search" ":__main__.Person:index" "*" "LIMIT" "0" "1" "SORTBY" "emp_no" "desc"

I'll look at a PR for this, and I suspect that both first and get_item may benefit from this.

Thanks for looking into this.

simonprickett commented 2 years ago

This will be fixed in the next release.

simonprickett commented 2 years ago

Fixed in 0.0.26.