pgvector / pgvector-python

pgvector support for Python
MIT License
975 stars 63 forks source link

Can't adapt type 'numpy.ndarray' when used via polars #78

Closed jmtatsch closed 4 months ago

jmtatsch commented 4 months ago

Hello,

I would like to use pgvector python and polars.DataFrame.write_database with pgvectors SQLAlchemy support to write data to a vectordb.

When I pass a SA engine that I initialised for use with pgvector and write

engine = create_engine(conn_string, pool_pre_ping=True, pool_recycle=300)
conn = engine.connect()
conn.execute(text("CREATE EXTENSION IF NOT EXISTS vector"))
conn.commit()
polars_df.write_database(table_name=vdb_table_name, connection=conn)

it fails with:

sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'numpy.ndarray'

polars_df.write_database is using pandas.to_sql which is using sqlalchemy which is using psycopg2.

I guess that hints at problems with writing the embeddings column (array) to vectordb because pgvector is not registered correctly? Any hints what I might be doing wrong?

ankane commented 4 months ago

Hi @jmtatsch, write_database won't know how to create a vector column, but you can create the table manually and use if_table_exists='append'.

import numpy as np
from pgvector.psycopg2 import register_vector
import polars as pl
from sqlalchemy import create_engine, text

engine = create_engine('postgresql+psycopg2://localhost/pgvector_example')
conn = engine.connect()
conn.execute(text('CREATE EXTENSION IF NOT EXISTS vector'))
conn.execute(text('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))'))
conn.commit()
register_vector(conn.connection)

df = pl.DataFrame({'embedding': [np.array([1, 2, 3])]})
df.write_database(table_name='items', connection=conn, if_table_exists='append')
ankane commented 4 months ago

It looks like you can use engine_options to create a vector column.

import numpy as np
from pgvector.sqlalchemy import Vector
import polars as pl
from sqlalchemy import create_engine, text

engine = create_engine('postgresql+psycopg2://localhost/pgvector_example')
conn = engine.connect()
conn.execute(text('CREATE EXTENSION IF NOT EXISTS vector'))
conn.commit()

df = pl.DataFrame({'embedding': [np.array([1, 2, 3])]})
df.write_database(table_name='items', connection=conn, engine_options={'dtype': {'embedding': Vector(3)}})
jmtatsch commented 4 months ago

Awesome. Thank you very much.