pgvector / pgvector-python

pgvector support for Python
MIT License
951 stars 63 forks source link

Error when querying nearest vector #54

Closed ashwincr closed 8 months ago

ashwincr commented 8 months ago

I am getting the same error when querying for similar vector. I have looked at the 50 as well as others linked there but still can't resolve. My encoding and insertions work correctly. I am using sentence transformer encoding and inserting by converting it into numpy.array using the following code

from sentence_transformers import SentenceTransformer
import numpy as np
from pgvector.psycopg import register_vector
import psycopg

model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
#read dataframe from csv and get the list to be encoded
string_list = list(df["NAME"][:10])

#Encode the string list
string_embeddings = model.encode(string_list)
conn = psycopg.connect(connection parameters)
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
register_vector(conn)

conn.execute('DROP TABLE IF EXISTS master_vec')
conn.execute('CREATE TABLE master_vec (master_id bigserial PRIMARY KEY, name text, embedding vector(384))')

for master_name, embedding in zip(string_list, string_embeddings):
    conn.execute('INSERT INTO master_vec (name, embedding) VALUES (%s, %s)', (name, np.array(embedding)))

#verify insertion
master_id = 1
neighbors = conn.execute('SELECT name FROM master_vec WHERE master_id != %(id)s ORDER BY embedding <=> (SELECT embedding FROM master_vec WHERE master_id = %(id)s) LIMIT 5', {'id': master_id}).fetchall()
for neighbor in neighbors:
    print(neighbor[0])

this works as expected and gives me the 10 records that are inserted Now doing the semantic search

from sentence_transformers import SentenceTransformer
import numpy as np
from pgvector.psycopg import register_vector
import psycopg

model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
conn = psycopg.connect(connection parameters)
register_vector(conn)

target = 'target string'
target_embedding = model.encode(target)
target_embedding = np.array(target_embedding)
neighbors = conn.execute('SELECT name FROM master_vec ORDER BY embedding <-> %s LIMIT 3', target_embedding).fetchall()
for neighbor in neighbors:
    print(neighbor[0])

This fails with TypeError: query parameters should be a sequence or a mapping, got ndarray I get the same TypeError irrespective of whether I convert the embeddings to a numpy.array or not. If I use a list then it gives me the error ProgrammingError: the query has 1 placeholders but 384 parameters were passed. These same errors are obtained even if cast embeddings as a vector in the query. What am I doing wrong here?

ankane commented 8 months ago

Hi @ashwincr, positional parameters need to be passed as a tuple or list (docs).

conn.execute(sql, (target_embedding,))