vitabaks / postgresql_cluster

Automated database platform for PostgreSQL® A modern, open-source alternative to cloud-managed databases.
https://postgresql-cluster.org
MIT License
1.83k stars 418 forks source link

Add pgvectorscale extension #762

Closed vitabaks closed 2 months ago

vitabaks commented 2 months ago

Add Timescale pgvectorscale extension.

pgvectorscale builds on pgvector with higher performance embedding search and cost-efficient storage for AI applications.

Variables

Compatible with Debian 12, Ubuntu 22.04 and 24.04 (only deb packages are available) for Postgres 13-17

Deploy Timescale HA Cluster with pgvectorscale

To deploy a PostgreSQL High-Availability Cluster with the pgvectorscale extension, add the enable_pgvectorscale variable:

ansible-playbook deploy_pgcluster.yml  -e "enable_timescale=true" -e "enable_pgvectorscale=true"

[!note] Variable enable_timescale is optional, in this example we install pgvectorscale, pgvector, and timescaledb extensions.

vitabaks commented 2 months ago

Test (Deploy Timescale HA Cluster with pgvectorscale)

ansible-playbook deploy_pgcluster.yml  -e "enable_timescale=true" -e "enable_pgvectorscale=true"

Ansible log:

PLAY [Deploy PostgreSQL HA Cluster (based on "Patroni")] ***********************
...
TASK [add-repository : Add TimescaleDB repository] *****************************
changed: [10.172.0.20]
changed: [10.172.0.21]
changed: [10.172.0.22]
...
TASK [packages : Install TimescaleDB package] **********************************
changed: [10.172.0.20] => (item=timescaledb-2-postgresql-16)
changed: [10.172.0.21] => (item=timescaledb-2-postgresql-16)
changed: [10.172.0.22] => (item=timescaledb-2-postgresql-16)

TASK [packages : Install pgvector package] *************************************
changed: [10.172.0.21]
changed: [10.172.0.22]
changed: [10.172.0.20]

TASK [packages : Looking up the latest version of pgvectorscale] ***************
ok: [10.172.0.22]
ok: [10.172.0.21]
ok: [10.172.0.20]

TASK [packages : Download pgvectorscale archive] *******************************
changed: [10.172.0.21]
changed: [10.172.0.22]
changed: [10.172.0.20]

TASK [packages : Extract pgvectorscale package] ********************************
changed: [10.172.0.20]
changed: [10.172.0.22]
changed: [10.172.0.21]

TASK [packages : Install pgvectorscale v0.3.0 package] *************************
changed: [10.172.0.22]
changed: [10.172.0.21]
changed: [10.172.0.20]
...

Create extensions

[!note] Extensions can be created automatically if you define them in the postgresql_extensions variable.

postgres=# \dx
                 List of installed extensions
  Name   | Version |   Schema   |         Description          
---------+---------+------------+------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(1 row)

postgres=# show shared_preload_libraries ;
          shared_preload_libraries           
---------------------------------------------
 pg_stat_statements,auto_explain,timescaledb
(1 row)

postgres=# CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
NOTICE:  installing required extension "vector"
CREATE EXTENSION
postgres=# \dx
                               List of installed extensions
    Name     | Version |   Schema   |                     Description                      
-------------+---------+------------+------------------------------------------------------
 plpgsql     | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector      | 0.7.4   | public     | vector data type and ivfflat and hnsw access methods
 vectorscale | 0.3.0   | public     | pgvectorscale:  Advanced indexing for vector data
(3 rows)

Check vectorscale

postgres=# CREATE TABLE IF NOT EXISTS document_embedding  (
    id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    metadata JSONB,
    contents TEXT,
    embedding VECTOR(1536)
)
postgres-# ;
CREATE TABLE
postgres=# CREATE INDEX document_embedding_idx ON document_embedding
USING diskann (embedding);
NOTICE:  Starting index build. num_neighbors=-1 search_list_size=100, max_alpha=1.2, storage_layout=SbqCompression
WARNING:  Indexed 0 tuples
CREATE INDEX
postgres=# \d+ document_embedding
                                                    Table "public.document_embedding"
  Column   |     Type     | Collation | Nullable |             Default              | Storage  | Compression | Stats target | Description 
-----------+--------------+-----------+----------+----------------------------------+----------+-------------+--------------+-------------
 id        | bigint       |           | not null | generated by default as identity | plain    |             |              | 
 metadata  | jsonb        |           |          |                                  | extended |             |              | 
 contents  | text         |           |          |                                  | extended |             |              | 
 embedding | vector(1536) |           |          |                                  | external |             |              | 
Indexes:
    "document_embedding_pkey" PRIMARY KEY, btree (id)
    "document_embedding_idx" diskann (embedding)
Access method: heap

passed