truera / trulens

Evaluation and Tracking for LLM Experiments
https://www.trulens.org/
MIT License
2.21k stars 194 forks source link

Help with context-based feedback functions on LangChain #638

Closed epinzur closed 11 months ago

epinzur commented 1 year ago

I just watched your excellent training done in partnership with LlamaIndex: https://learn.deeplearning.ai/building-evaluating-advanced-rag

I was wondering how add the RAG Triad metrics to my LangChain pipeline experiments.

I figured out "Answer Relevance" easily, because no context is required. However, for "Context Relevance" and "Groundedness", context is needed, and the TruChain class doesn't have a select_source_nodes() method.

I looked through the langchain examples and also couldn't find any feedback functions using the context.

Can you provide some help on how to do this?

joshreini1 commented 1 year ago

Hey @epinzur !

Thanks for the question. Can you share a short code snippet on how you’re setting up your langchain pipeline?

joshreini1 commented 1 year ago

Btw, for guidance on using feedback function selectors, you can also check out:

https://www.trulens.org/trulens_eval/feedback_function_guide/

epinzur commented 1 year ago

I'm basically trying to duplicate the example in this pinecone/canopy video in langchain: https://www.youtube.com/watch?v=dVGPglKh80Y

And then I'd like to use trulens & the RAG Triad to evaluate the different pipelines.

Here is my langchain code for the basic RAG pipeline.

  1. Install libraries

    %pip install langchain datasets==2.14.6 jq chromadb
  2. Create a jsonl file from the huggingFace dataSet: jamescalam/ai-arxiv

    from datasets import load_dataset
    
    data = load_dataset("jamescalam/ai-arxiv", split="train")
    data.to_json("ai_arxiv.jsonl", orient="records", lines=True)
  3. Load the jsonl into documents

    from langchain.document_loaders import JSONLoader
    
    # Define the metadata extraction function.
    def metadata_func(record: dict, metadata: dict) -> dict:
        metadata["id"] = record.get("id")
        metadata["title"] = record.get("title")
        metadata["source"] = record.get("source")
        metadata["category"] = record.get("primary_category")
        metadata["published"] = record.get("published")
        metadata["updated"] = record.get("updated")
    
        return metadata
    
    loader = JSONLoader(
        file_path='./ai_arxiv.jsonl',
        jq_schema='.',
        content_key="content",
        metadata_func=metadata_func,
        json_lines=True)
    
    docs = loader.load()
  4. Split the documents into chunks

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size = 1000,
        chunk_overlap  = 200,
        length_function = len,
        is_separator_regex = False,
    )
    
    chunked_docs = text_splitter.transform_documents(docs)
  5. Use chromadb and OpenAI to embed and insert the documents into a test db

    import os, getpass
    os.environ["OPENAI_API_KEY"] = getpass.getpass()
    from langchain.vectorstores import Chroma
    from langchain.embeddings import OpenAIEmbeddings
    
    embeddings = OpenAIEmbeddings(disallowed_special=())
    
    db = Chroma.from_documents(documents=chunked_docs, embedding=embeddings, persist_directory="db_open_ai")
  6. Build a chain for retrieval and inference

    from langchain import hub
    from langchain.chat_models import ChatOpenAI
    from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
    from langchain.schema import StrOutputParser
    from langchain.schema.runnable import RunnablePassthrough
    
    llm = ChatOpenAI(model_name="gpt-3.5-turbo-1106", temperature=0.1)
    
    retriever = db.as_retriever()
    prompt = hub.pull("rlm/rag-prompt")
    
    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)
    
    open_ai_chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    open_ai_chain.invoke("Can you tell me a bit about llama 2?")

    'Llama 2 is a collection of pretrained and fine-tuned large language models...'

epinzur commented 1 year ago

Also it seems all of your examples use the legacy LLMChain implementation and not the new LCEL chain notation.

See: https://python.langchain.com/docs/modules/chains/

Do you have any examples using LCEL?

joshreini1 commented 1 year ago

Taking a look @epinzur !

epinzur commented 11 months ago

@joshreini1 any update on this? I'd really like to start using TruLens for Langchain LCEL evaluation ASAP.

joshreini1 commented 11 months ago

Hey @epinzur ! This is released in trulens-eval==0.19.2!

See a usage example: https://www.trulens.org/trulens_eval/langchain_quickstart/

epinzur commented 11 months ago

Amazing! Thanks @joshreini1!