truera / trulens

Evaluation and Tracking for LLM Experiments
https://www.trulens.org/
MIT License
2.13k stars 183 forks source link

Help with context-based feedback functions on LangChain #638

Closed epinzur closed 10 months ago

epinzur commented 11 months ago

I just watched your excellent training done in partnership with LlamaIndex: https://learn.deeplearning.ai/building-evaluating-advanced-rag

I was wondering how add the RAG Triad metrics to my LangChain pipeline experiments.

I figured out "Answer Relevance" easily, because no context is required. However, for "Context Relevance" and "Groundedness", context is needed, and the TruChain class doesn't have a select_source_nodes() method.

I looked through the langchain examples and also couldn't find any feedback functions using the context.

Can you provide some help on how to do this?

joshreini1 commented 11 months ago

Hey @epinzur !

Thanks for the question. Can you share a short code snippet on how you’re setting up your langchain pipeline?

joshreini1 commented 11 months ago

Btw, for guidance on using feedback function selectors, you can also check out:

https://www.trulens.org/trulens_eval/feedback_function_guide/

epinzur commented 11 months ago

I'm basically trying to duplicate the example in this pinecone/canopy video in langchain: https://www.youtube.com/watch?v=dVGPglKh80Y

And then I'd like to use trulens & the RAG Triad to evaluate the different pipelines.

Here is my langchain code for the basic RAG pipeline.

  1. Install libraries

    %pip install langchain datasets==2.14.6 jq chromadb
  2. Create a jsonl file from the huggingFace dataSet: jamescalam/ai-arxiv

    from datasets import load_dataset
    
    data = load_dataset("jamescalam/ai-arxiv", split="train")
    data.to_json("ai_arxiv.jsonl", orient="records", lines=True)
  3. Load the jsonl into documents

    from langchain.document_loaders import JSONLoader
    
    # Define the metadata extraction function.
    def metadata_func(record: dict, metadata: dict) -> dict:
        metadata["id"] = record.get("id")
        metadata["title"] = record.get("title")
        metadata["source"] = record.get("source")
        metadata["category"] = record.get("primary_category")
        metadata["published"] = record.get("published")
        metadata["updated"] = record.get("updated")
    
        return metadata
    
    loader = JSONLoader(
        file_path='./ai_arxiv.jsonl',
        jq_schema='.',
        content_key="content",
        metadata_func=metadata_func,
        json_lines=True)
    
    docs = loader.load()
  4. Split the documents into chunks

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size = 1000,
        chunk_overlap  = 200,
        length_function = len,
        is_separator_regex = False,
    )
    
    chunked_docs = text_splitter.transform_documents(docs)
  5. Use chromadb and OpenAI to embed and insert the documents into a test db

    import os, getpass
    os.environ["OPENAI_API_KEY"] = getpass.getpass()
    from langchain.vectorstores import Chroma
    from langchain.embeddings import OpenAIEmbeddings
    
    embeddings = OpenAIEmbeddings(disallowed_special=())
    
    db = Chroma.from_documents(documents=chunked_docs, embedding=embeddings, persist_directory="db_open_ai")
  6. Build a chain for retrieval and inference

    from langchain import hub
    from langchain.chat_models import ChatOpenAI
    from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
    from langchain.schema import StrOutputParser
    from langchain.schema.runnable import RunnablePassthrough
    
    llm = ChatOpenAI(model_name="gpt-3.5-turbo-1106", temperature=0.1)
    
    retriever = db.as_retriever()
    prompt = hub.pull("rlm/rag-prompt")
    
    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)
    
    open_ai_chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    open_ai_chain.invoke("Can you tell me a bit about llama 2?")

    'Llama 2 is a collection of pretrained and fine-tuned large language models...'

epinzur commented 11 months ago

Also it seems all of your examples use the legacy LLMChain implementation and not the new LCEL chain notation.

See: https://python.langchain.com/docs/modules/chains/

Do you have any examples using LCEL?

joshreini1 commented 10 months ago

Taking a look @epinzur !

epinzur commented 10 months ago

@joshreini1 any update on this? I'd really like to start using TruLens for Langchain LCEL evaluation ASAP.

joshreini1 commented 10 months ago

Hey @epinzur ! This is released in trulens-eval==0.19.2!

See a usage example: https://www.trulens.org/trulens_eval/langchain_quickstart/

epinzur commented 10 months ago

Amazing! Thanks @joshreini1!