Open RWayne93 opened 3 months ago
trulens documentation mainly shows usage with OpenAI models, so there is sufficient tutorials there i am looking at alternatives and making them work with trulens
I'm interested in alternatives as well and not relying on OpenAI models.
@kesamet Really enjoying this so far i have tried adding pgvector to your framework but can't seem to get it to work. in your config i have added
VECTORDB_TYPE: pgvector
VECTORDB_PATH: postgresql+psycopg2://postgres:mysecretpassword@localhost:5432/postgres
COLLECTION_NAME: pdf_document_chunks
then updated in app_conv.py
def load_vectordb():
if CFG.VECTORDB_TYPE == "faiss":
return load_faiss(BASE_EMBEDDINGS)
if CFG.VECTORDB_TYPE == "chroma":
return load_chroma(BASE_EMBEDDINGS)
if CFG.VECTORDB_TYPE == "pgvector":
print("Loading pgvector")
print(load_pgvector(BASE_EMBEDDINGS, CFG.VECTORDB_PATH, CFG.COLLECTION_NAME))
return load_pgvector(BASE_EMBEDDINGS, CFG.VECTORDB_PATH, CFG.COLLECTION_NAME)
raise NotImplementedError
vectordb.py i have added load_pgvector
def load_pgvector(
embedding_function: Embeddings, collection_name: str, connection_string: str
) -> VectorStore:
"""Loads a PGVector index from disk."""
print(f"collection_name = {collection_name}")
print(f"connection_string = {connection_string}")
print(f"embedding_function = {embedding_function}")
return PGVector(
connection_string=connection_string,
embedding_function=embedding_function,
collection_name=collection_name
)
not sure what the issue is the connection string is correct since i can see the database and a couple of inserted documents.
EDIT:
ok i got pgvector working just an issue now with the returned sources not rendering properly or page info not being shown a dirty hack i implemented. I think the issue might be because I implemented some custom insertion logic before trying to get pgvector to work with your existing framework.
def display_source_document_info(row):
# Check if 'page' key exists before accessing it
if 'page' in row.metadata:
st.write("**Page {}**".format(row.metadata["page"] + 1))
else:
# Handle the case where 'page' key is missing
st.write("**Page information not available**")
st.info(row.page_content)
Looks right to me. If you can also include how you save docs to postgres, that would be great. What is saved in metadata is dependent on user, one can choose not to save page number
Looks right to me. If you can also include how you save docs to postgres, that would be great.
What is saved in metadata is dependent on user, one can choose not to save page number
I will. It was a little challenging at first I had to include my own uuid in the metadata for each chunk as well (I need this for a dataset I'm putting together) Postgres assigns one internally however when I use the retriever it isn't included and I couldn't figure out how to get the uuids returned to me that are generated by Postgres.
@kesamet hey man getting ready to submit a PR soon I created a fork and made a poetry branch for users that want to use poetry instead of conda. Also this is my first time using langchain this much and was wondering how exactly do you return the relevance_score with the base retriever similar to reranker retriever. That one the scores are returned in the metadata but base_retriever they are not. I have tried a few things like a ScoredRetrievalQA class with no luck. I've read through the docs and don't see how i am suppose to get the scores from the base retriever.
def build_base_retriever(vectordb: VectorStore) -> VectorStoreRetriever:
retriever = vectordb.as_retriever(
search_kwargs={"k": CFG.BASE_RETRIEVER_CONFIG.SEARCH_K},
# search_type="similarity",
)
print(f"here is the retriever: {retriever}")
return retriever
We could probably write a wrapper of VectorStoreRetriever that can also output scores in the metadata field. I can help do that See PR
That looks good. I think I was just using the wrong modules from langchain.
Edit: I ran this and it works. Curious is the similarity metric just cosine?
cosine distance to be exact, not cosine similarity
Basically the title. I have just started with getting into rag evaluation and came across your repo with truelens. Curious as to why gemni pro. I don't have access to pro yet.