Open Sachin-Bhat opened 2 months ago
I changed the schema due to issues with the community implementation.
Easiest solution is to re-create the schema if you don't mind re-indexing the data.
If you don't want to re-index the data, connect to the postgres instance and you'll need to write a bit of SQL to make the migration.
If I'm not mistaken the main change in the schema:
1) Rename custom_id to id in langchain_pg_embedding 2) JSON column was changed to JSONB
Hey @eyurtsev,
Tried writing the necessary SQL to update the schema as specified by you. However, when I try to index documents, I get the following error:
Error indexing document with id 000218462 and chunk_id e49fa716-422a-430a-b633-7d0901ed45d3: (psycopg.errors.InvalidColumnReference) there is no unique or exclusion constraint matching the ON CONFLICT specification
Cheers, Sachin
You need something like this to add a uniqueness constraint. This fixes a bug that is present in the community implementation and will make sure that you won't have duplicated content if you're indexing by id
ALTER TABLE [your_table_name]
ADD CONSTRAINT id_unique UNIQUE (id);
Double check the commands / that it matches what you have in the database. Make a backup if you need before running it etc.
Hey @eyurtsev,
I have tried to replicate the langchain-community
indexed documents table schema to match that of langchain-postgres
as closely as possible. I do still get some errors:
Error indexing document with id 000134955 and chunk_id 67cf6d54-7885-47ab-a42f-91367c070fd7: (psycopg2.errors.NotNullViolation) null value in column "uuid" of relation "langchain_pg_embedding" violates not-null constraint
DETAIL: Failing row contains (058cabb5-a81a-4fee-b77c-3a3acdc04454, [0.019579852,-0.03575338,-0.01300123,0.0077682347,0.020762963,-0..., lorem ipsum ..., {"id": "67cf6d54-7885-47ab-a42f-91367c070fd7", "Ack": "", "CveId..., 67cf6d54-7885-47ab-a42f-91367c070fd7, null).
[SQL: INSERT INTO langchain_pg_embedding (id, collection_id, embedding, document, cmetadata) VALUES (%(id_m0)s, %(collection_id_m0)s::UUID, %(embedding_m0)s, %(document_m0)s, %(cmetadata_m0)s) ON CONFLICT (id) DO UPDATE SET embedding = excluded.embedding, document = excluded.document, cmetadata = excluded.cmetadata]
[parameters: {'id_m0': '67cf6d54-7885-47ab-a42f-91367c070fd7', 'collection_id_m0': UUID('058cabb5-a81a-4fee-b77c-3a3acdc04454'), 'embedding_m0': '[0.019579852,-0.03575338,-0.01300123,0.0077682347,0.020762963,-0.018955793,0.016758585,0.015887503,0.014431365,0.043164082,-0.05083481,0.036741473,0. ... (12432 characters truncated) ... .040121794,-0.0029317772,0.023233198,-0.013586285,-0.043216087,-0.03837963,0.033075128,0.036013406,0.034115225,-0.023662237,-0.03458327,0.0066761314]', 'document_m0': 'lorem ipsum sit amet.\n', 'cmetadata_m0': 'metadata items'}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
Any help here would be much appreciated.
Cheers, Sachin
Hello,
I was getting the following error while indexing my embeddings onto PGVector. It did not throw such errors when I was using the old langchain-community implementation. Wondering if the database schema that created with the new implementation had some changes. Any clarification on how to go about this would be very helpful.
Cheers, Sachin