tensorchord / pgvecto.rs

Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.
https://docs.pgvecto.rs/getting-started/overview.html
Apache License 2.0
1.75k stars 71 forks source link

reason=pgvecto.rs: IPC connection is closed unexpectedly for 100M dataset #592

Closed agandra30 closed 1 month ago

agandra30 commented 1 month ago

Hi folks,

I am trying to validate and see if we could leverage pgvector.rs for our usecases at scale. Tried to create an HNSW MetricType.L2: 'L2' and parameters looks like this :

create_index_after_load=True
max_parallel_workers=16 
quantization_type='trivial' 
index=<IndexType.HNSW: 'HNSW'> 
m=16 ef_search=100 
ef_construction=300  index for the 100M using HNSW 

I ran 3 times and it failed

dataset': {'data': {'name': 'LAION', 'size': 100000000, 'dim': 768, 'metric_type': <MetricType.L2: 'L2'>}}, 'db': 'PgVectoRS-100mHNSWpgvectorrsr1v1-100mHNSWpgvectorrsr1v1'} failed to run, reason=pgvecto.rs: IPC connection is closed unexpectedly.

NOTE: The DB is up and the table is present it it a problem with the plugin ? or shall we switch back to the older pgvector plugin instead of rs ? any recommendations ?

You are now connected to database "mydatabase" as user "postgres" mydatabase=# \dx; List of installed extensions Name | Version | Schema | Description ---------+---------+------------+---------------------------------------------------------------------------------------------- plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language vectors | 0.0.0 | vectors | vectors: Vector database plugin for Postgres, written in Rust, specifically designed for LLM

VoVAllen commented 1 month ago

It's possible with 100M vectors, but you will need 1.5x-2x memory comparing to the total vector size

agandra30 commented 1 month ago

@VoVAllen thanks for the reply when you say memory , my configuration is postgres 16.4 running on a baremetal ubuntu machine that have close to 1Ti of Memory and

The client machine is also a ubuntu machine where i am running the scripts to connect to the DB run the validations is also has approx. 500Gi of memory.

Do you still observe or recommend more memory requirements ?

Postgres server.(16.4) :

# free  -mh

               total        used        free      shared  buff/cache   available
Mem:           1.0Ti        12Gi       598Gi       152Mi       396Gi       989Gi
Swap:             0B          0B          0B

Client :

 free  -mh
               total        used        free      shared  buff/cache   available
Mem:           503Gi       9.1Gi       295Gi        40Mi       198Gi       490Gi
Swap:             0B          0B          0B

Also are there any postgres settings that you recommend ?

  1. Since pgvector.rs is failing , do you recommend pgvector the old extension instead of this ?
  2. Its a single node configuration any tunabels you recommend
agandra30 commented 1 month ago

@VoVAllen I saw somewhere there is a fix given for this in 0.2.0, so i have even tried the executions with the latest release 0.3.0 . Still same issue ? any pointers will be highly appreciated

Also i have installed plugin from source not using the docker image , assuming any approach for installation shouldn't matter correct me if am wrong?

Also our postgres server is running or baremetal , (afaik postgres is monolith architecture and do not necessarily facilitate or improve better in running on k8s) does pgvector.rs has any such limitations or requirments ?

Because we really are looking at large data sets and at scale

image

cutecutecat commented 1 month ago

Hello, I am the author of PGVecto.rs backend of https://github.com/zilliztech/VectorDBBench.

Large datasets like cohere-10M, openai-5M and laion-100M are not yet been validated on PGVecto.rs before.

We are investigating the result of them on AWS now:

Based on current observations, laion-100M would cost not less than 329834 MB, about 512GB memory is enough.

We will continue to pay attention to the bench results, and would be happy to let you know if we have some conclusions.

cutecutecat commented 1 month ago

Sorry about the delay. We finally interrupt the laion-100M dataset test as the AWS instance is too expensive. Nevertheless, It seems to be fine at about first 50%-67% process of index build procedure, without any error. The image we pick for test is tensorchord/pgvecto-rs:pg16-v0.4.0-alpha.2.

As there is many bugfix before 0.4.0, it's hard to identify the key issue. We recommend you could use the above image to rerun the test, and check whether the error happens.

For PGVecto.rs 0.4.0, there is a few changed at VectorDBBench, you could manually upgrade the dependency of SDK to 0.2.2 or wait https://github.com/zilliztech/VectorDBBench/pull/373 to be merged and release.

I will close this issue after the VectorDBBench PR is finished. If you have any update, feel free to let me know.