Closed cutecutecat closed 2 months ago
For what it's worth, I have tried pulling down these changes and am having serialization issues with tensorchord/pgvecto-rs:pg16-v0.3.0
ValueError: Cannot serialize: <pgvecto_rs.types.index.IndexOption object at 0xffff5fa9a500>
For what it's worth, I have tried pulling down these changes and am having serialization issues with
tensorchord/pgvecto-rs:pg16-v0.3.0
ValueError: Cannot serialize: <pgvecto_rs.types.index.IndexOption object at 0xffff5fa9a500>
@zbloss Thanks for your report, I found that Django cannot serialize a customed struct, only basic types, so I changed Index to a flatten struct, like pgvector-python does.
The tests I have implemented before are failed to catch it. If convenient, can you share me your codes or workflow to let me catch this case in future tests?
Thanks for looking at this quickly. I'm now getting a new error when running manage.py makemigrations
in my project
ImportError: cannot import name 'Index' from 'pgvecto_rs.django' (/usr/local/lib/python3.10/site-packages/pgvecto_rs/django/__init__.py)
I can't share the code, but it is a fairly simple django project. This error occurs when trying to makemigrations
.
Thanks for looking at this quickly. I'm now getting a new error when running
manage.py makemigrations
in my project
ImportError: cannot import name 'Index' from 'pgvecto_rs.django' (/usr/local/lib/python3.10/site-packages/pgvecto_rs/django/__init__.py)
I can't share the code, but it is a fairly simple django project. This error occurs when trying to
makemigrations
.
Sorry about that, as this PR is under heavy developing, the API may changes rapidly before fully reviewed and tested. After today's refactor, there is no Index
at pgvecto_rs.django
now, instead we have HnswIndex
, IvfIndex
and FlatIndex
:
from django.db import models
from pgvecto_rs.django import HnswIndex, IvfIndex
from pgvecto_rs.types import IndexOption, Hnsw
class Item(models.Model):
class Meta:
indexes = [
HnswIndex(
name="emb_idx_1",
fields=["embedding"],
opclasses=["vector_l2_ops"],
# don't pass any of `m`, `ef_construction`, `threads`, `quantization_type` or `quantization_ratio`
# if created by `with_option`, they will be overwritten
).with_option(
IndexOption(index=Hnsw(m=16, ef_construction=100), threads=1)
),
# or
IvfIndex(
name="emb_idx_2",
fields=["embedding"],
nlist=3,
opclasses=["vector_l2_ops"],
),
]
I tried this branch on my pet project. Was specifically interested in SparseVector
implementation. Used it to play with hybrid search described here. Works good so far with SQLAlchemy. Thanks @cutecutecat!
I tried this branch on my pet project. Was specifically interested in
SparseVector
implementation. Used it to play with hybrid search described here. Works good so far with SQLAlchemy. Thanks @cutecutecat!
@sskorol I am happy that you are interested in our new sparse features, we can simplify the procession of sparse vectors now, and no need to extract indices and values from model output. I would update that article soon, with our new powerful features in SDK v0.2.0!
@cutecutecat thanks! I'm doing it this way for now. But I'm curious to see how it could be done optimally.
class VectorDAO(BaseDAO[VectorRecord, VectorCreate, VectorUpdate]):
def __init__(
self,
model: Type[ModelType],
session_factory: Callable[..., ContextManager[Session]],
):
super().__init__(model, session_factory)
def insert_vector(self, vector: VectorCreate):
with self.session_factory() as db:
stmt = insert(VectorRecord).values(**vector.model_dump())
db.execute(stmt)
db.commit()
def get_top_matches(
self, search_vector: VectorSearch, top_k: int = 10
) -> List[ModelType]:
with self.session_factory() as db:
dense_query = (
db.query(self.model)
.order_by(self._l2_distance(search_vector.v_dense))
.limit(top_k)
.subquery()
)
sparse_query = (
db.query(self.model)
.order_by(self._dot_product(search_vector.v_sparse))
.limit(top_k)
.subquery()
)
dense_alias = aliased(self.model, dense_query)
sparse_alias = aliased(self.model, sparse_query)
final_query = db.query(dense_alias).union_all(db.query(sparse_alias))
return final_query.all()
def _l2_distance(self, dense_vector: Vector):
return self.model.v_dense.op("<->")(dense_vector)
def _dot_product(self, sparse_vector: SparseVector):
return self.model.v_sparse.op("<#>")(sparse_vector)
And index/extension is added via alembic migration:
def upgrade() -> None:
op.execute("CREATE EXTENSION IF NOT EXISTS vectors")
op.create_table(
"vectorrecord",
# ...
sa.Column("v_dense", VECTOR(1024), nullable=False),
sa.Column("v_sparse", SVECTOR(250002), nullable=False),
)
op.execute(
"""
CREATE INDEX IF NOT EXISTS idx_sparse_vector ON vectorrecord
USING vectors (v_sparse svector_dot_ops)
WITH (options = '[indexing.hnsw]');
"""
)
op.execute(
"""
CREATE INDEX IF NOT EXISTS idx_dense_vector ON vectorrecord
USING vectors (v_dense vector_l2_ops)
WITH (options = '[indexing.hnsw]');
"""
)
Close #2 Close #3 Close #4 Close https://github.com/tensorchord/pgvecto.rs/issues/518
Feature
Add supports for:
Usages
Now we can build a sparse vector from these statements: