pgvector / pgvector-python

pgvector support for Python
MIT License
979 stars 63 forks source link

halfvec error - ValueError: could not convert string to float: '' #101

Closed frieda-huang closed 2 weeks ago

frieda-huang commented 2 weeks ago

Hi! I keep getting ValueError: could not convert string to float: '' due to the values in my halfvec column being malformed.

This is what my halfvec column looks like:

{"[0.011108398,-0.11376953,-0.044677734,-0.23632812, ...]"}
vector_embedding: Mapped[list[np.array]] = mapped_column(ARRAY(HALFVEC(VECT_DIM)))

I also made sure to convert each vector using np.array before upsert

vector_embedding = [np.array(e) for e in embeddings]

Interestingly, the other field vector_embedding: Mapped[np.array] = mapped_column(HALFVEC(VECT_DIM)) in a separate table works just fine. Am I missing something when using array halfvec?

ankane commented 2 weeks ago

Hi @frieda-huang, for arrays with SQLAlchemy, you'll need to call register_vector on the underlying adapter. For Psycopg 2, use:

from pgvector.psycopg2 import register_vector

with engine.connect() as connection:
    register_vector(connection.connection.dbapi_connection, globally=True, arrays=True)

Added a test case in the commit above.

frieda-huang commented 2 weeks ago

Hi @frieda-huang, for arrays with SQLAlchemy, you'll need to call register_vector on the underlying adapter. For Psycopg 2, use:

from pgvector.psycopg2 import register_vector

with engine.connect() as connection:
    register_vector(connection.connection.dbapi_connection, globally=True, arrays=True)

Added a test case in the commit above.

Thank you for the quick reply! I'm still getting the same error despite calling register_vector_async. I'm using psycopg3:

async def add(self, vector_embedding: List[npt.NDArray], page: Page) -> Embedding:
  conn = await psycopg.AsyncConnection.connect(dbname=DBNAME, autocommit=True)
  async with conn:
      await register_vector_async(conn)
      embedding = Embedding(
          vector_embedding=vector_embedding,
          page=page,
          last_modified=get_now(),
          created_at=get_now(),
      )
      self.session.add(embedding)
      return embedding

Error:

          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/sqlalchemy/sql/sqltypes.py", line 3144, in <genexpr>
    return collection_callable(itemproc(x) for x in arr)
                               ^^^^^^^^^^^
  File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/pgvector/sqlalchemy/halfvec.py", line 33, in process
    return HalfVector._from_db(value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/pgvector/utils/halfvec.py", line 71, in _from_db
    return cls.from_text(value)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/pgvector/utils/halfvec.py", line 36, in from_text
    return cls([float(v) for v in value[1:-1].split(',')])
                ^^^^^^^^
ValueError: could not convert string to float: ''
ankane commented 2 weeks ago

For Psycopg 3, you'll need to call it on the connection used for the session (self.session.connection() - rather than a new connection).

ankane commented 2 weeks ago

The best way to do this would be to use the connect event right after you define the engine. For Psycopg 3:

from pgvector.psycopg import register_vector
from sqlalchemy import event

@event.listens_for(engine, "connect")
def connect(dbapi_connection, connection_record):
    register_vector(dbapi_connection)
frieda-huang commented 2 weeks ago

The best way to do this would be to use the connect event right after you define the engine. For Psycopg 3:

from pgvector.psycopg import register_vector
from sqlalchemy import event

@event.listens_for(engine, "connect")
def connect(dbapi_connection, connection_record):
    register_vector(dbapi_connection)

Is there a way we can do it using async? I'm using it along with FastAPI. Having tried multiple approaches including adding the register_vector logic in FastAPI's lifespan, still got the same error :(

ankane commented 2 weeks ago

If you're using create_async_engine, you'll want to use:

from pgvector.psycopg import register_vector_async

@event.listens_for(engine.sync_engine, "connect")
def connect(dbapi_connection, connection_record):
    dbapi_connection.run_async(register_vector_async)
frieda-huang commented 1 week ago

If you're using create_async_engine, you'll want to use:

from pgvector.psycopg import register_vector_async

@event.listens_for(engine.sync_engine, "connect")
def connect(dbapi_connection, connection_record):
    dbapi_connection.run_async(register_vector_async)

Thank you! It works now!