Closed shenhai-ran closed 5 months ago
Hey @shenhai-ran! I'm here to assist you with any bugs, questions, or contributions while you wait for a human maintainer. Let's tackle this issue together!
To configure a self-signed certificate for the IngestionPipeline
with the OpenAI API in LlamaIndex, you need to ensure that the HTTP client settings are correctly applied to all components, including the IngestionPipeline
. Here’s how you can do it:
Configure the HTTP client with the self-signed certificate:
Ensure that the http_client
and async_http_client
are configured with the path to your self-signed certificate.
Pass the configured HTTP client to the OpenAIEmbedding
used in the IngestionPipeline
:
You need to explicitly pass the http_client
and async_http_client
to the OpenAIEmbedding
instance used in the IngestionPipeline
.
Here is an updated version of your code:
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
import httpx
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.extractors import TitleExtractor
from llama_index.core.ingestion import IngestionPipeline
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
from llama_index.readers.file import PyMuPDFReader
# Configure HTTP clients with self-signed certificate
http_client = httpx.Client(verify="<path/to/certificate>")
async_http_client = httpx.AsyncClient(verify="<path/to/certificate>")
# Set the default LLM and embedding model with the configured HTTP clients
Settings.llm = OpenAI(
http_client=http_client,
async_http_client=async_http_client,
)
Settings.embed_model = OpenAIEmbedding(
http_client=http_client,
)
# Load documents and create an index
documents = SimpleDirectoryReader("path/to/folder").load_data()
index = VectorStoreIndex.from_documents(documents, show_progress=True)
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("<my query>")
print(response)
# Configure Chroma vector store
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("ingestion_demo")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# Create an IngestionPipeline with the configured OpenAIEmbedding
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=300, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(http_client=http_client, async_http_client=async_http_client), # Pass the configured HTTP clients
],
vector_store=vector_store,
)
# Load documents and run the pipeline
documents = SimpleDirectoryReader(
"<path/to/folder>",
required_exts=[".pdf"],
file_extractor={".pdf": PyMuPDFReader()},
).load_data()
pipeline.run(documents=documents, show_progress=True)
index = VectorStoreIndex.from_vector_store(vector_store, show_progress=True)
vector_retriever = index.as_retriever(similarity_top_k=2)
By explicitly passing the configured http_client
and async_http_client
to the OpenAIEmbedding
instance used in the IngestionPipeline
, you ensure that the self-signed certificate is correctly applied, which should resolve the APIConnectionError
you are encountering.
To continue talking to Dosu, mention @dosu.
Question Validation
Question
I am working within a corporate environment and we have self-signed certificate. I understand that for
OpenAI
client, I could configurehttp_client
to specify my certificate, it works normally until I start to tryingestion pipeline
. Here is a minimal example with what I mean:llm
andembed_model
with my cerfiticateVectorStoreIndex
andquery_engine
to get a response.It works so far.
Then next, I use
IngestionPipeline
withOpenAIEmbedding()
then I got errorsThe errors as follow
Is there other place that I need to configure?
Thanks for your help!