pgvector / pgvector-python

pgvector support for Python
MIT License
975 stars 63 forks source link

Psycopg2 error with example: #60

Closed pamelafox closed 9 months ago

pamelafox commented 9 months ago

I'm using the example code with some modifications for database connection:

import os

import psycopg2
from dotenv import load_dotenv
from pgvector.psycopg2 import register_vector

load_dotenv(".env", override=True)
DBUSER = os.environ["DBUSER"]
DBPASS = os.environ["DBPASS"]
DBHOST = os.environ["DBHOST"]
DBNAME = os.environ["DBNAME"]
# Use SSL if not connecting to localhost
DBSSL = "disable"
if DBHOST != "localhost":
    DBSSL = "require"

conn = psycopg2.connect(database=DBNAME, user=DBUSER, password=DBPASS, host=DBHOST, sslmode=DBSSL)
conn.autocommit = True
cur = conn.cursor()
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
cur.execute("DROP TABLE IF EXISTS items")
cur.execute("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));")
register_vector(conn)

cur.execute("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)")

cur.execute("INSERT INTO items (embedding) VALUES ('[1, 2, 3]'), ('[-1, 1, 3]'), ('[0, -1, -2]');")

embedding = [3, 1, 2]
cur.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,))
rows = cur.fetchall()
print(rows)

cur.close()

When I run it, I get the following error:

Traceback (most recent call last):
  File "/workspace/examples/psycopg_items.py", line 30, in <module>
    cur.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,))
psycopg2.errors.UndefinedFunction: operator does not exist: vector <-> integer[]
LINE 1: SELECT * FROM items ORDER BY embedding <-> ARRAY[3,1,2] LIMI...
                                               ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
pamelafox commented 9 months ago

I checked the unit tests for this repo and noticed the psycopg2 test doesn't use the <-> operator, so maybe there's a reason for that?

pamelafox commented 9 months ago

Ah! My error was due to using a list[float] instead of a numpy.array. I did not realize the np.array aspect was important. I suggest clarifying that in the README, assuming that's indeed the case.

MobisParkHeekang commented 9 months ago

I face the same problem, and realize that pgvector-python requires numpy array. It seems like it cannot process python array as input.

ankane commented 9 months ago

Hi @pamelafox and @MobisParkHeekang, check out #4.