timescale / pgvectorscale

A complement to pgvector for high performance, cost efficient vector search on large workloads.
PostgreSQL License
610 stars 23 forks source link

OOM while creating diskann index. postgresql docker exited with 137 #105

Open msk-apk opened 1 week ago

msk-apk commented 1 week ago

Just wanted to replicate the scale test as per the psvectorscale documentation.

postgresql configuration:

shared_buffers = 32128MB effective_cache_size = 96386MB maintenance_work_mem = 2047MB
work_mem = 8224kB

machine configuration:

40 CPU
MemTotal: 131599120 kB model name : Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz cpu MHz : 999.996

Created a table as below.

postgres=# \d search_item; Table "public.search_item" Column | Type | Collation | Nullable | Default
-----------+-----------------------+-----------+----------+----------------------------------------- id | bigint | | not null | nextval('search_item_id_seq'::regclass) name | character varying(50) | | | embedding | vector(768) | | | Indexes: "search_item_pkey" PRIMARY KEY, btree (id)

Added 20 M entries in this table. While creating index using the below query it threw OOM after 20 hours.

CREATE INDEX IF NOT EXISTS document_embedding_idx ON search_item USING diskann (embedding)

Could you please let me know what i am missing here. Which configuration needs to be tuned to avoid OOM.

regards Msk

cevian commented 1 week ago

Can you rerun the index creation after setting client_min_messages to debug1?

SET client_min_messages = 'debug1';
CREATE INDEX IF NOT EXISTS document_embedding_idx ON search_item USING diskann (embedding)
msk-apk commented 2 days ago

After enabling client_min_messages to debug1 and reran the test. The OOM did not occur. Its able to index 20 M docs with out any issues.