pgvector / pgvector-python

pgvector support for Python
MIT License
951 stars 63 forks source link

register_vector failing on Supabase #90

Closed uogbuji closed 1 month ago

uogbuji commented 1 month ago

Hi, using pgvector for a client who is on the Supabase service (not hosted). I've enabled pgvector for a schema my_schema in their console. I want to use asyncpg with it, but it's not working:

import os, asyncpg
conn = await asyncpg.connect(os.environ['DB_CONN'])
await conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
# Gets: 'CREATE EXTENSION'
from pgvector.asyncpg import register_vector
await register_vector(conn)

I get:

File …/.venv/lib/python3.12/site-packages/asyncpg/connection.py:560, in Connection._introspect_type(self, typename, schema)
    551     rows = await self._execute(
    552         introspection.TYPE_BY_NAME,
    553         [typename, schema],
   (...)
    556         ignore_custom_codec=True,
    557     )
    559 if not rows:
--> 560     raise ValueError(
    561         'unknown type: {}.{}'.format(schema, typename))
    563 return rows[0]

ValueError: unknown type: public.vector

I've tried the likes of

await conn.execute('SET search_path TO my_schema, public')
# I get 'SET'

and

await conn.execute('CREATE EXTENSION IF NOT EXISTS vector WITH SCHEMA my_schema')
# Gets: 'CREATE EXTENSION'
await conn.execute('CREATE EXTENSION IF NOT EXISTS vector WITH SCHEMA public')
# Gets: 'CREATE EXTENSION'

I've used pgvector a lot in self-hosted PG with no problem, so I do suspect it's something odd about Supabase, and I know that's another support venue, but I do know that Supa is popular, and surely I'm not the only one to be banging their heads against this wall.

pip show asyncpg pgvector
Name: asyncpg
Version: 0.29.0
Summary: An asyncio PostgreSQL driver
Home-page: 
Author: 
Author-email: MagicStack Inc <hello@magic.io>
License: Apache License, Version 2.0
Location: …/.venv/lib/python3.12/site-packages
Requires: 
Required-by: 
---
Name: pgvector
Version: 0.3.3
Summary: pgvector support for Python
Home-page: https://github.com/pgvector/pgvector-python
Author: Andrew Kane
Author-email: andrew@ankane.org
License: MIT
Location: …/.venv/lib/python3.12/site-packages
Requires: numpy
Required-by: 

For now, I'm having to switch to Supabase’s vecs. I did have to disable the pgvector extension and reenable on the extensions schema to get Vecs to work, but that didn't help with my asyncpg problems.

Following Vecs code works, in case it helps others. Vecs adds a SQLAlchemy layer, which I don't like, but I guess I’ll just have to deal, unless I can figure out things with asyncpg.

import vecs
import os
from sentence_transformers import SentenceTransformer

E_MODEL = SentenceTransformer('all-MiniLM-L6-v2')

DB_CONN = os.environ.get('DB_CONN')
# Just make sure we're not using the obsolete name Supabase gives us, as it blows up Vecs (via SQLAlchemy)
DB_CONN = DB_CONN.replace('postgres://', 'postgresql://')
vx = vecs.create_client(DB_CONN)

text = 'Hello world!'
emb = E_MODEL.encode(text)
meta = {'page': 1}
# create a vectorscollection
docs = vx.get_or_create_collection(name='test_docs', dimension=len(emb))

# add a record
docs.upsert(
    records=[(
         'v0',          # vector's identifier
         emb,           # vector. list or np.array
         meta           # associated  metadata
        )
    ]
)

# index the collection for fast search performance
docs.create_index()

text = 'What\'s up planet!'
emb = E_MODEL.encode(text)

# query the collection filtering metadata for page 1
result = docs.query(
    data=emb,             # required
    limit=1,                        # number of records to return
    filters={'page': {'$eq': 1}},   # metadata filters
)

print(result)
ankane commented 1 month ago

Hi @uogbuji, added a schema option in the commit above, so you can do:

from pgvector.asyncpg import register_vector

await register_vector(conn, schema='my_schema')

You could also move the extension to the public schema with:

ALTER EXTENSION vector SET SCHEMA public;
uogbuji commented 1 month ago

Oh wow! That's awesome (and quick!) I was kinda wondering whether I was missing a trick with there not being a schema option on register_vector, so this also helps me with relief that I wasn't out of my mind 😅. Thanks!