tensorchord / pgvecto.rs-py

PGVecto.rs Python library
Apache License 2.0
4 stars 1 forks source link

PGVecto.rs support for Python

discord invitation link trackgit-views trackgit-views trackgit-views

PGVecto.rs Python library, supports Django, SQLAlchemy, and Psycopg 3.

Vector Sparse Vector Half-Precision Vector Binary Vector
SQLAlchemy ✅Insert ✅Insert ✅Insert ✅Insert
Psycopg3 ✅Insert ✅Copy ✅Insert ✅Copy ✅Insert ✅Copy ✅Insert ✅Copy
Django ✅Insert ✅Insert ✅Insert ✅Insert

Usage

Install from PyPI:

pip install pgvecto_rs

And use it with your database library:

Or as a standalone SDK:

Requirements

To initialize a pgvecto.rs instance, you can run our official image by Quick start:

You can get the latest tags from the Release page. For example, it might be:

docker run \
  --name pgvecto-rs-demo \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  -d tensorchord/pgvecto-rs:pg16-v0.3.0

SQLAlchemy

Install dependencies:

pip install "pgvecto_rs[sqlalchemy]"

Initialize a connection

from sqlalchemy import create_engine
from sqlalchemy.orm import Session

URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
engine = create_engine(URL)
with Session(engine) as session:
    pass

Enable the extension

from sqlalchemy import text

session.execute(text('CREATE EXTENSION IF NOT EXISTS vectors'))

Create a model

from pgvecto_rs.sqlalchemy import Vector

class Item(Base):
    embedding = mapped_column(Vector(3))

All supported types are shown in this table

Native types Types for SQLAlchemy Correspond to pgvector-python
vector VECTOR VECTOR
svector SVECTOR SPARSEVEC
vecf16 VECF16 HALFVEC
bvector BVECTOR BIT

Insert a vector

from sqlalchemy import insert

stmt = insert(Item).values(embedding=[1, 2, 3])
session.execute(stmt)
session.commit()

Add an approximate index

from sqlalchemy import Index
from pgvecto_rs.types import IndexOption, Hnsw, Ivf

index = Index(
    "emb_idx_1",
    Item.embedding,
    postgresql_using="vectors",
    postgresql_with={
        "options": f"$${IndexOption(index=Ivf(), threads=1).dumps()}$$"
    },
    postgresql_ops={"embedding": "vector_l2_ops"},
)
# or
index = Index(
    "emb_idx_2",
    Item.embedding,
    postgresql_using="vectors",
    postgresql_with={
        "options": f"$${IndexOption(index=Hnsw()).dumps()}$$"
    },
    postgresql_ops={"embedding": "vector_l2_ops"},
)
# Apply changes
index.create(session.bind)

Get the nearest neighbors to a vector

from sqlalchemy import select

session.scalars(select(Item.embedding).order_by(Item.embedding.l2_distance(target.embedding)))

Also supports max_inner_product, cosine_distance and jaccard_distance(for BVECTOR)

Get items within a certain distance

session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))

See examples/sqlalchemy_example.py and tests/test_sqlalchemy.py for more examples

Psycopg3

Install dependencies:

pip install "pgvecto_rs[psycopg3]"

Initialize a connection

import psycopg

URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
with psycopg.connect(URL) as conn:
    pass

Enable the extension and register vector types

from pgvecto_rs.psycopg import register_vector

conn.execute('CREATE EXTENSION IF NOT EXISTS vectors')
register_vector(conn)
# or asynchronously
# await register_vector_async(conn)

Create a table

conn.execute('CREATE TABLE items (embedding vector(3))')

Insert or copy vectors into table

conn.execute('INSERT INTO items (embedding) VALUES (%s)', ([1, 2, 3],))
# or faster, copy it
with conn.cursor() as cursor, cursor.copy(
    "COPY items (embedding) FROM STDIN (FORMAT BINARY)"
) as copy:
    copy.write_row([np.array([1, 2, 3])])

Add an approximate index

from pgvecto_rs.types import IndexOption, Hnsw, Ivf

conn.execute(
    "CREATE INDEX emb_idx_1 ON items USING \
        vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
        IndexOption(index=Hnsw(), threads=1).dumps()
    ),
)
# or
conn.execute(
    "CREATE INDEX emb_idx_2 ON items USING \
        vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
        IndexOption(index=Ivf()).dumps()
    ),
)
# Apply all changes
conn.commit()

Get the nearest neighbors to a vector

conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()

Get the distance

conn.execute('SELECT embedding <-> %s FROM items \
    ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()

Get items within a certain distance

conn.execute('SELECT * FROM items WHERE embedding <-> %s < 1.0 \
    ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()

See examples/psycopg_example.py and tests/test_psycopg.py for more examples

Django

Install dependencies:

pip install "pgvecto_rs[django]"

Create a migration to enable the extension

from pgvecto_rs.django import VectorExtension

class Migration(migrations.Migration):
    operations = [
        VectorExtension()
    ]

Add a vector field to your model

from pgvecto_rs.django import VectorField

class Document(models.Model):
    embedding = VectorField(dimensions=3)

All supported types are shown in this table

Native types Types for Django Correspond to pgvector-python
vector VectorField VectorField
svector SparseVectorField SparseVectorField
vecf16 Float16VectorField HalfVectorField
bvector BinaryVectorField BitField

Insert a vector

Item(embedding=[1, 2, 3]).save()

Add an approximate index

from django.db import models
from pgvecto_rs.django import HnswIndex, IvfIndex
from pgvecto_rs.types import IndexOption, Hnsw

class Item(models.Model):
    class Meta:
        indexes = [
            HnswIndex(
                name="emb_idx_1",
                fields=["embedding"],
                opclasses=["vector_l2_ops"],
                m=16,
                ef_construction=100,
                threads=1,
            )
            # or
            IvfIndex(
                name="emb_idx_2",
                fields=["embedding"],
                nlist=3,
                opclasses=["vector_l2_ops"],
            ),
        ]

Get the nearest neighbors to a vector

from pgvecto_rs.django import L2Distance

Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]

Also supports MaxInnerProduct, CosineDistance and JaccardDistance(for BinaryVectorField)

Get the distance

Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))

Get items within a certain distance

Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)

See examples/django_example.py and tests/test_django.py for more examples.

SDK

Our SDK is designed to use the pgvecto.rs out-of-box. You can exploit the power of pgvecto.rs to do similarity search or retrieve with filters, without writing any SQL code.

Install dependencies:

pip install "pgvecto_rs[sdk]"

A minimal example:

from pgvecto_rs.sdk import PGVectoRs, Record

# Create a client
client = PGVectoRs(
    db_url="postgresql+psycopg://postgres:mysecretpassword@localhost:5432/postgres",
    table_name="example",
    dimension=3,
)

try:
    # Add some records
    client.add_records(
        [
            Record.from_text("hello 1", [1, 2, 3]),
            Record.from_text("hello 2", [1, 2, 4]),
        ]
    )

    # Search with default operator (sqrt_euclid).
    # The results is sorted by distance
    for rec, dis in client.search([1, 2, 5]):
        print(rec.text)
        print(dis)
finally:
    # Clean up (i.e. drop the table)
    client.drop()

Output:

hello 2
1.0
hello 1
4.0

See examples/sdk_example.py and tests/test_sdk.py for more examples.

Development

This package is managed by PDM.

Set up things:

pdm venv create
pdm use # select the venv inside the project path
pdm sync -d -G :all --no-isolation

# lock requirement
# need pdm >=2.17: https://pdm-project.org/latest/usage/lock-targets/#separate-lock-files-or-merge-into-one
pdm lock -d -G :all --python=">=3.9"
pdm lock -d -G :all --python="<3.9" --append
# install package to local
# `--no-isolation` is required for scipy
pdm install -d --no-isolation

Run lint:

pdm run format
pdm run fix
pdm run check

Run test in current environment:

pdm run test

Test

Tox is used to test the package locally.

Run test in all environment:

tox run

Acknowledgement

We would like to express our gratitude to the creators and contributors of the pgvector-python repository for their valuable code and architecture, which greatly influenced the development of this repository.