tensorchord / pgvecto.rs

Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.
https://docs.pgvecto.rs/getting-started/overview.html
Apache License 2.0
1.77k stars 73 forks source link

[v0.2.0] IPC connection is closed unexpected #376

Closed perfectra1n closed 8 months ago

perfectra1n commented 9 months ago

Hi there!

Thanks for the great extension, y'all have a much higher IQ than I do lol. Unfortunately, I'm running into the following error:

immichdb=# SELECT pgvectors_upgrade();
ERROR:  pgvecto.rs: IPC connection is closed unexpected.
ADVICE: The error is raisen by background worker errors. Please check the full PostgreSQL log to get more information. Please read `https://docs.pgvecto.rs/admin/configuration.html`.

Context

I'm running my PostgreSQL Cluster via CNPG. I'm using the tensorchord/cloudnative-pgvecto.rs container:

ghcr.io/tensorchord/cloudnative-pgvecto.rs:16.2-v0.2.0

The pgvecto.rs extension is being used by my Immich. I had accidentally ran the ghcr.io/tensorchord/cloudnative-pgvecto.rs:16.2-v0.2.0 image before Immich supported it (previously had been running ghcr.io/tensorchord/cloudnative-pgvecto.rs:16-v0.1.13), so I left my Immich alone until it supported v0.2.0. Yesterday, after I updated Immich to v1.95.0, I first had the following error from the Cloudnative-pg when trying to follow the upgrade section Upgrade from 0.1.x:

ERROR: type "vectorindexinfo" does not exist

And then when trying to run the command

So, I navigated to this repository and found the update script here located here, and noticed that it appeared as though the entire thing had already been run (the index names had been changed, etc.), and since it appeared to be erroring on this line, I went ahead and forcibly updated the extension's version on my Immich DB:

UPDATE pg_catalog.pg_extension SET extversion = '0.2.0' WHERE extname = 'vectors';

But then, as you can imagine, this didn't resolve the error. My Immich woulnd't run and reported errors such as:

ERROR:  pgvecto.rs: IPC connection is closed unexpected.
ADVICE: The error is raisen by background worker errors. Please check the full PostgreSQL log to get more information. Please read `https://docs.pgvecto.rs/admin/configuration.html`.

So just to get it running, and at the cost of losing my Machine Learning information, I went ahead and dropped the extension:

immichdb=# DROP EXTENSION IF EXISTS vectors CASCADE;
NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to column embedding of table asset_faces
drop cascades to column embedding of table smart_search
DROP EXTENSION
immichdb=# CREATE EXTENSION vectors;
CREATE EXTENSION
immichdb=#

Which at least allowed my Immich to get up and running, however, per @mertalev, I then needed to recreate the indexes (that I had previously dropped) via the following query:

ALTER TABLE smart_search
ADD COLUMN embedding vector(512);

ALTER TABLE asset_faces 
ADD COLUMN embedding vector(512);

SET vectors.pgvector_compatibility=on;

CREATE INDEX IF NOT EXISTS clip_index ON smart_search
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);

CREATE INDEX IF NOT EXISTS face_index ON asset_faces
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);

However, after running it, I had the following output:

immichdb=# ALTER TABLE smart_search
ADD COLUMN embedding vector(512);

ALTER TABLE asset_faces
ADD COLUMN embedding vector(512);

SET vectors.pgvector_compatibility=on;

CREATE INDEX IF NOT EXISTS clip_index ON smart_search
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);

CREATE INDEX IF NOT EXISTS face_index ON asset_faces
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);
ALTER TABLE
ALTER TABLE
SET
ERROR:  pgvecto.rs: The extension is upgraded so all index files are outdated.
ADVICE: Delete all index files. Please read `https://docs.pgvecto.rs/admin/upgrading.html`.
ERROR:  pgvecto.rs: The extension is upgraded so all index files are outdated.
ADVICE: Delete all index files. Please read `https://docs.pgvecto.rs/admin/upgrading.html`.

So I went ahead and restarted my cluster. However, I get the following error and it crashes my CNPG primary node when executing:

immichdb=# SET vectors.pgvector_compatibility=on;

CREATE INDEX IF NOT EXISTS clip_index ON smart_search
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);

CREATE INDEX IF NOT EXISTS face_index ON asset_faces
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);
SET
ERROR:  pgvecto.rs: IPC connection is closed unexpected.
ADVICE: The error is raisen by background worker errors. Please check the full PostgreSQL log to get more information. Please read `https://docs.pgvecto.rs/admin/configuration.html`.
WARNING:  AbortTransaction while in ABORT state
ERROR:  pgvecto.rs: IPC connection is closed unexpected.
ADVICE: The error is raisen by background worker errors. Please check the full PostgreSQL log to get more information. Please read `https://docs.pgvecto.rs/admin/configuration.html`.
WARNING:  AbortTransaction while in ABORT state
ERROR:  pgvecto.rs: IPC connection is closed unexpected.
ADVICE: The error is raisen by background worker errors. Please check the full PostgreSQL log to get more information. Please read `https://docs.pgvecto.rs/admin/configuration.html`.
WARNING:  AbortTransaction while in ABORT state
ERROR:  pgvecto.rs: IPC connection is closed unexpected.
ADVICE: The error is raisen by background worker errors. Please check the full PostgreSQL log to get more information. Please read `https://docs.pgvecto.rs/admin/configuration.html`.
ERROR:  pgvecto.rs: IPC connection is closed unexpected.
ADVICE: The error is raisen by background worker errors. Please check the full PostgreSQL log to get more information. Please read `https://docs.pgvecto.rs/admin/configuration.html`.
PANIC:  ERRORDATA_STACK_SIZE exceeded
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?> command terminated with exit code 137

I've restarted my CNPG cluster and failed over the primary node inbetween each of the above commands to make sure something wasn't funky. I've already set my CNPG's cluster .spec.postgresql.enableAlterSystem to true, and executed the following command:

ALTER SYSTEM SET logging_collector = on;

I'm sure I'm doing something wrong, but I don't see any additional logs in my CNPG cluster from pgvecto.rs. Would you be able to point me to what I'm doing wrong, or what logs/information I can pull so that I can help debug this issue? @mertalev told me to create an issue here after seeing the error logs lol.

VoVAllen commented 9 months ago

It seems that there are some issues with the upgrade process of pgvecto.rs on cloudnative-pg. I suspect it may be related to file permissions. We will need some time to investigate this issue, because it seems that cloudnative-pg did not collect the complete pgvecto.rs logs, we need to reproduce this issue on our side first.

Thank you for your detailed report! It's helpful to us

perfectra1n commented 9 months ago

Of course, thanks again for the great extension! If I can provide any more information or output, please do let me know. Yeah, it appears that #374 would help quite a bit.

shrinedogg commented 9 months ago

I am having this same issue with the ghcr.io/tensorchord/cloudnative-pgvecto.rs:14.11-v0.2.0 container release with Immich. Can also can provide logs, feedback, etc., but OP summed it up quite well.

usamoi commented 9 months ago

Could you try to delete the directory pg_vectors under PostgreSQL data directory? This should be done by select pgvectors_upgrade(); but it seems not to be executed.

mertalev commented 9 months ago

I think the error might be caused by the tables not being empty when the embedding columns are added. Since they'll have null values, the index creation will fail and throw enough errors to crash Postgres.

perfectra1n commented 9 months ago

Yeah, @mertalev had an interesting idea, to run the following query:

DELETE FROM smart_search WHERE true;
DELETE FROM asset_faces WHERE true;

In order to clear out any possibly null values, and then to run the following:

SET vectors.pgvector_compatibility=on;

CREATE INDEX IF NOT EXISTS clip_index ON smart_search
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);

CREATE INDEX IF NOT EXISTS face_index ON asset_faces
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);

And it worked!:

immichdb=# DELETE FROM smart_search WHERE true; DELETE FROM asset_faces WHERE true;
DELETE 0
DELETE 52227
immichdb=# ALTER TABLE smart_search
ADD COLUMN embedding vector(512);

ALTER TABLE asset_faces
ADD COLUMN embedding vector(512);

SET vectors.pgvector_compatibility=on;

CREATE INDEX IF NOT EXISTS clip_index ON smart_search
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);

CREATE INDEX IF NOT EXISTS face_index ON asset_faces
    USING hnsw (embedding vector_cosine_ops)
    WITH (ef_construction = 300, m = 16);
ERROR:  column "embedding" of relation "smart_search" already exists
ERROR:  column "embedding" of relation "asset_faces" already exists
SET
CREATE INDEX
CREATE INDEX

So it appears that there was null data in the columns that the index was being created for.

ishioni commented 9 months ago

Unfortunately the above does not work for me. After trying to add the index it crashes. The funny thing is, both my smart_search and asset_faces tables columns are completely empty. Not null values, just no data. Yet I can trigger the crash reliably every time i try to run the index

VoVAllen commented 9 months ago

@ishioni Could you try to delete the directory pg_vectors under PostgreSQL data directory? This should be done by select pgvectors_upgrade(); but it seems not working. We're still investigating the root cause now

mertalev commented 9 months ago

Unfortunately the above does not work for me. After trying to add the index it crashes. The funny thing is, both my smart_search and asset_faces tables columns are completely empty. Not null values, just no data. Yet I can trigger the crash reliably every time i try to run the index

@ishioni Could you try this instead of DELETE FROM?

TRUNCATE TABLE smart_search;
TRUNCATE TABLE asset_faces CASCADE;
TRUNCATE TABLE person;
VoVAllen commented 8 months ago

NULL insert is fixed in 0.2.1.

Release discussion: https://github.com/tensorchord/pgvecto.rs/discussions/405