rh-aiservices-bu / llm-on-openshift

Resources, demos, recipes,... to work with LLMs on OpenShift with OpenShift AI or Open Data Hub.
Apache License 2.0
90 stars 86 forks source link

Vector Extension not enabled in PGVector #60

Closed strangiato closed 5 months ago

strangiato commented 5 months ago

In the following notebook:

https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/examples/notebooks/langchain/Langchain-PgVector-Ingest.ipynb

The Create the index and ingest the documents step fails with the following error message:

Exception: Failed to create vector extension: (psycopg.errors.InsufficientPrivilege) permission denied to create extension "vector"
HINT:  Must be superuser to create this extension.
[SQL: BEGIN;SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector;COMMIT;]
(Background on this error at: https://sqlalche.me/e/20/f405)

The full stack trace can be found here:

---------------------------------------------------------------------------
InsufficientPrivilege                     Traceback (most recent call last)
File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1971, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1970     if not evt_handled:
-> 1971         self.dialect.do_execute(
   1972             cursor, str_statement, effective_parameters, context
   1973         )
   1975 if self._has_events or self.engine._has_events:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/default.py:919, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    918 def do_execute(self, cursor, statement, parameters, context=None):
--> 919     cursor.execute(statement, parameters)

File /opt/app-root/lib64/python3.9/site-packages/psycopg/cursor.py:732, in Cursor.execute(self, query, params, prepare, binary)
    731 except e._NO_TRACEBACK as ex:
--> 732     raise ex.with_traceback(None)
    733 return self

InsufficientPrivilege: permission denied to create extension "vector"
HINT:  Must be superuser to create this extension.

The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:383, in PGVector.create_vector_extension(self)
    377 statement = sqlalchemy.text(
    378     "BEGIN;"
    379     "SELECT pg_advisory_xact_lock(1573678846307946496);"
    380     "CREATE EXTENSION IF NOT EXISTS vector;"
    381     "COMMIT;"
    382 )
--> 383 session.execute(statement)
    384 session.commit()

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/orm/session.py:2306, in Session.execute(self, statement, params, execution_options, bind_arguments, _parent_execute_state, _add_event)
   2255 r"""Execute a SQL expression construct.
   2256 
   2257 Returns a :class:`_engine.Result` object representing
   (...)
   2304 
   2305 """
-> 2306 return self._execute_internal(
   2307     statement,
   2308     params,
   2309     execution_options=execution_options,
   2310     bind_arguments=bind_arguments,
   2311     _parent_execute_state=_parent_execute_state,
   2312     _add_event=_add_event,
   2313 )

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/orm/session.py:2200, in Session._execute_internal(self, statement, params, execution_options, bind_arguments, _parent_execute_state, _add_event, _scalar_result)
   2199 else:
-> 2200     result = conn.execute(
   2201         statement, params or {}, execution_options=execution_options
   2202     )
   2204 if _scalar_result:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1422, in Connection.execute(self, statement, parameters, execution_options)
   1421 else:
-> 1422     return meth(
   1423         self,
   1424         distilled_parameters,
   1425         execution_options or NO_OPTIONS,
   1426     )

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/sql/elements.py:514, in ClauseElement._execute_on_connection(self, connection, distilled_params, execution_options)
    513         assert isinstance(self, Executable)
--> 514     return connection._execute_clauseelement(
    515         self, distilled_params, execution_options
    516     )
    517 else:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1644, in Connection._execute_clauseelement(self, elem, distilled_parameters, execution_options)
   1636 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache(
   1637     dialect=dialect,
   1638     compiled_cache=compiled_cache,
   (...)
   1642     linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
   1643 )
-> 1644 ret = self._execute_context(
   1645     dialect,
   1646     dialect.execution_ctx_cls._init_compiled,
   1647     compiled_sql,
   1648     distilled_parameters,
   1649     execution_options,
   1650     compiled_sql,
   1651     distilled_parameters,
   1652     elem,
   1653     extracted_params,
   1654     cache_hit=cache_hit,
   1655 )
   1656 if has_events:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1850, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1849 else:
-> 1850     return self._exec_single_context(
   1851         dialect, context, statement, parameters
   1852     )

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1990, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1989 except BaseException as e:
-> 1990     self._handle_dbapi_exception(
   1991         e, str_statement, effective_parameters, cursor, context
   1992     )
   1994 return result

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:2357, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context, is_sub_exec)
   2356     assert sqlalchemy_exception is not None
-> 2357     raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
   2358 else:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/base.py:1971, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1970     if not evt_handled:
-> 1971         self.dialect.do_execute(
   1972             cursor, str_statement, effective_parameters, context
   1973         )
   1975 if self._has_events or self.engine._has_events:

File /opt/app-root/lib64/python3.9/site-packages/sqlalchemy/engine/default.py:919, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    918 def do_execute(self, cursor, statement, parameters, context=None):
--> 919     cursor.execute(statement, parameters)

File /opt/app-root/lib64/python3.9/site-packages/psycopg/cursor.py:732, in Cursor.execute(self, query, params, prepare, binary)
    731 except e._NO_TRACEBACK as ex:
--> 732     raise ex.with_traceback(None)
    733 return self

ProgrammingError: (psycopg.errors.InsufficientPrivilege) permission denied to create extension "vector"
HINT:  Must be superuser to create this extension.
[SQL: BEGIN;SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector;COMMIT;]
(Background on this error at: https://sqlalche.me/e/20/f405)

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
Cell In[14], line 3
      1 embeddings = HuggingFaceEmbeddings()
----> 3 db = PGVector.from_documents(
      4     documents=all_splits,
      5     embedding=embeddings,
      6     collection_name=COLLECTION_NAME,
      7     connection_string=CONNECTION_STRING,
      8     #pre_delete_collection=True # This deletes existing collection and its data, use carefully!
      9 )

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:1139, in PGVector.from_documents(cls, documents, embedding, collection_name, distance_strategy, ids, pre_delete_collection, use_jsonb, **kwargs)
   1135 connection_string = cls.get_connection_string(kwargs)
   1137 kwargs["connection_string"] = connection_string
-> 1139 return cls.from_texts(
   1140     texts=texts,
   1141     pre_delete_collection=pre_delete_collection,
   1142     embedding=embedding,
   1143     distance_strategy=distance_strategy,
   1144     metadatas=metadatas,
   1145     ids=ids,
   1146     collection_name=collection_name,
   1147     use_jsonb=use_jsonb,
   1148     **kwargs,
   1149 )

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:1011, in PGVector.from_texts(cls, texts, embedding, metadatas, collection_name, distance_strategy, ids, pre_delete_collection, use_jsonb, **kwargs)
   1003 """
   1004 Return VectorStore initialized from texts and embeddings.
   1005 Postgres connection string is required
   1006 "Either pass it as a parameter
   1007 or set the PGVECTOR_CONNECTION_STRING environment variable.
   1008 """
   1009 embeddings = embedding.embed_documents(list(texts))
-> 1011 return cls.__from(
   1012     texts,
   1013     embeddings,
   1014     embedding,
   1015     metadatas=metadatas,
   1016     ids=ids,
   1017     collection_name=collection_name,
   1018     distance_strategy=distance_strategy,
   1019     pre_delete_collection=pre_delete_collection,
   1020     use_jsonb=use_jsonb,
   1021     **kwargs,
   1022 )

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:481, in PGVector.__from(cls, texts, embeddings, embedding, metadatas, ids, collection_name, distance_strategy, connection_string, pre_delete_collection, use_jsonb, **kwargs)
    478 if connection_string is None:
    479     connection_string = cls.get_connection_string(kwargs)
--> 481 store = cls(
    482     connection_string=connection_string,
    483     collection_name=collection_name,
    484     embedding_function=embedding,
    485     distance_strategy=distance_strategy,
    486     pre_delete_collection=pre_delete_collection,
    487     use_jsonb=use_jsonb,
    488     **kwargs,
    489 )
    491 store.add_embeddings(
    492     texts=texts, embeddings=embeddings, metadatas=metadatas, ids=ids, **kwargs
    493 )
    495 return store

File /opt/app-root/lib64/python3.9/site-packages/langchain_core/_api/deprecation.py:183, in deprecated.<locals>.deprecate.<locals>.finalize.<locals>.warn_if_direct_instance(self, *args, **kwargs)
    181     warned = True
    182     emit_warning()
--> 183 return wrapped(self, *args, **kwargs)

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:341, in PGVector.__init__(self, connection_string, embedding_function, embedding_length, collection_name, collection_metadata, distance_strategy, pre_delete_collection, logger, relevance_score_fn, connection, engine_args, use_jsonb, create_extension)
    320 if not use_jsonb:
    321     # Replace with a deprecation warning.
    322     warn_deprecated(
    323         "0.0.29",
    324         pending=True,
   (...)
    339         ),
    340     )
--> 341 self.__post_init__()

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:348, in PGVector.__post_init__(self)
    346 """Initialize the store."""
    347 if self.create_extension:
--> 348     self.create_vector_extension()
    350 EmbeddingStore, CollectionStore = _get_embedding_collection_store(
    351     self._embedding_length, use_jsonb=self.use_jsonb
    352 )
    353 self.CollectionStore = CollectionStore

File /opt/app-root/lib64/python3.9/site-packages/langchain_community/vectorstores/pgvector.py:386, in PGVector.create_vector_extension(self)
    384         session.commit()
    385 except Exception as e:
--> 386     raise Exception(f"Failed to create vector extension: {e}") from e

Exception: Failed to create vector extension: (psycopg.errors.InsufficientPrivilege) permission denied to create extension "vector"
HINT:  Must be superuser to create this extension.
[SQL: BEGIN;SELECT pg_advisory_xact_lock(1573678846307946496);CREATE EXTENSION IF NOT EXISTS vector;COMMIT;]
(Background on this error at: https://sqlalche.me/e/20/f405)

Work Around:

  1. Access the terminal for the PGVector pod.
  2. Open postgres cli interface: psql
  3. Select the vector database: \c vectordb
  4. Create the vector extension: CREATE EXTENSION vector;
strangiato commented 5 months ago

And I just realized this is mentioned at the top of the notebook...