neuml / txtai

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
https://neuml.github.io/txtai
Apache License 2.0
9.13k stars 592 forks source link

postgresql issue related to scoring, under load #710

Closed allanwakes closed 5 months ago

allanwakes commented 5 months ago

I used txtai (7.0.0) and postgres + pgvector and fastapi, embedding with content and hybrid enabled.

When I request my search api, one by one. no issue at all. But issues came out when I fired 5-10 requests at the same time.

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "scores_pkey"
DETAIL:  Key (indexid)=(17) already exists.

[SQL: INSERT INTO scores (indexid, score) VALUES (%(indexid__0)s, %(score__0)s), (%(indexid__1)s, %(score__1)s), (%(indexid__2)s, %(score__2)s)]
[parameters: {'indexid__0': 17, 'score__0': 0.9622645378112793, 'indexid__1': 18, 'score__1': 0.8587514162063599, 'indexid__2': 9, 'score__2': 0.8211467266082764}]
sqlalchemy.exc.InternalError: (psycopg2.errors.InFailedSqlTransaction) current transaction is aborted, commands ignored until end of transaction block

[SQL: SELECT pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_
class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s]
[parameters: {'table_name': 'batch', 'param_1': 'r', 'param_2': 'p', 'param_3': 'f', 'param_4': 'v', 'param_5': 'm', 'nspname_1': 'pg_catalog'}]
sqlalchemy.exc.InternalError: (psycopg2.errors.InFailedSqlTransaction) current transaction is aborted, commands ignored until end of transaction block

[SQL: DELETE FROM scores]
sqlalchemy.exc.InternalError: (psycopg2.errors.InFailedSqlTransaction) current transaction is aborted, commands ignored until end of transaction block

[SQL: DELETE FROM batch]

I'm confused, then I cleaned everything, and disabled hybrid this time. But these Errors remain.

Thanks for any ideas.

davidmezzetti commented 5 months ago

Thanks for reporting this in.

Are you using the txtai API interface or FastAPI directly?

allanwakes commented 5 months ago

Thanks for reporting this in.

Are you using the txtai API interface or FastAPI directly?

I used FastAPI directly.

davidmezzetti commented 5 months ago

txtai has a built-in FastAPI interface - https://github.com/neuml/txtai/blob/master/src/python/txtai/app/base.py#L312

If you review that file, you can implement a similar locking strategy to prevent this issue.

allanwakes commented 5 months ago

Thanks for the direction, now I see a lock is needed when updating index. issue closed.