pgvector / pgvector-python

pgvector support for Python
MIT License
951 stars 63 forks source link

Takes more time to get the value from Class django.db.models.query.QuerySet #87

Closed vence-andersen closed 2 months ago

vence-andersen commented 2 months ago

I'm using PGVector in Django. Below is the code I use to find the similairty match, but the time it takes to get the value from class QuerySet is very high like it takes more than 3 seconds.

embedding=self.get_input_embedding(inputText)
output=(
      table.objects.annotate(
            distance=CosineDistance("embedding",embedding)
      )
      .order_by(CosineDistance("embedding",embedding))
      .filter(distance__lt=1-threshold)[:2]
)
for neigh in output:
      print(neigh.Link)

The time taken to find the cosine distance (var output) is in milliseconds, but the time it takes to get the value (for loop) from output variable is more than 3 seconds. What possibly could be the reason for this.

PGVector Version - 0.2.5 Python Version - 3.10.12 Django Version - 5.0.3

ankane commented 2 months ago

Hi @vence-andersen, the query won't be executed until the loop starts (output is a QuerySet object), so that's likely what's taking the time. I'd recommend using EXPLAIN ANALYZE to debug query performance and add an index if needed.