langchain-ai / langsmith-sdk

LangSmith Client SDK Implementations
https://smith.langchain.com/
MIT License
373 stars 68 forks source link

Issue: 'dict' object has no attribute 'replace' #620

Closed helioLJ closed 4 months ago

helioLJ commented 4 months ago

Issue you'd like to raise.

I followed Langsmith's documentation to create this RAG evaluation, but I keep receiving this error: Error Type: AttributeError, Message: 'dict' object has no attribute 'replace'.

custom_rag_prompt = PromptTemplate.from_template(bot_template)

def format_docs(docs):
    if isinstance(docs, dict):
        return docs['page_content']
    else:
        return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

# A simple example dataset
examples = [
    {
        "inputs": {
            "question": "What's the company's total revenue for q2 of 2022?",
            "documents": [
                {
                    "metadata": {},
                    "page_content": "In q1 the lemonade company made $4.95. In q2 revenue increased by a sizeable amount to just over $2T dollars.",
                }
            ],
        },
        "outputs": {
            "label": "2 trillion dollars",
        },
    },
    {
        "inputs": {
            "question": "Who is Lebron?",
            "documents": [
                {
                    "metadata": {},
                    "page_content": "On Thursday, February 16, Lebron James was nominated as President of the United States.",
                }
            ],
        },
        "outputs": {
            "label": "Lebron James is the President of the USA.",
        },
    },
]

client = Client()
uid = uuid.uuid4()

dataset_name = f"Faithfulness Example - {uid}"
dataset = client.create_dataset(dataset_name=dataset_name)
client.create_examples(
    inputs=[e["inputs"] for e in examples],
    outputs=[e["outputs"] for e in examples],
    dataset_id=dataset.id,
)

eval_config = RunEvalConfig(
    evaluators=["qa"],
    custom_evaluators=[FaithfulnessEvaluator()],
    input_key="question",
)

  print("Running on dataset")
  results = client.run_on_dataset (
      llm_or_chain_factory=rag_chain,
      dataset_name=dataset_name,
      evaluation=eval_config,
      verbose=True,
  )

That's how I'm loading and splitting my docs:

# 2. Load the documents
path = "docs/md"
loader = DirectoryLoader(path, glob="**/*.md", loader_cls=UnstructuredMarkdownLoader)
docs = loader.load()

# 3. Split the documents into chunks
headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
    ("####", "Header 4"),
]
markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on, strip_headers=False)
markdown_docs = []

for doc in docs:
    markdown_docs.extend(markdown_splitter.split_text(doc.page_content))

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
splits = text_splitter.split_documents(markdown_docs)

The RAG chain is executing okay, but the Evaluation part is not, so I don't receive any feedback from the AI:

image

Suggestion:

No response

hinthornw commented 4 months ago

Do you have an example trace you can share?

helioLJ commented 4 months ago

Do you have an example trace you can share?

Sure:

Screenshot from 2024-04-23 10-11-02

The error: Traceback (most recent call last): File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langsmith/run_helpers.py", line 541, in wrapper function_result = run_container["context"].run( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/code/-bot/src/ai/app.py", line 79, in <lambda> lambda x: rag_chain.invoke(x, config={"callbacks": [FormattedPromptHandler()]}), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2499, in invoke input = step.invoke( ^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3144, in invoke output = {key: future.result() for key, future in zip(steps, futures)} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3144, in <dictcomp> output = {key: future.result() for key, future in zip(steps, futures)} ^^^^^^^^^^^^^^^ File "/home/helio/.asdf/installs/python/3.11.4/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/home/helio/.asdf/installs/python/3.11.4/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/home/helio/.asdf/installs/python/3.11.4/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2499, in invoke input = step.invoke( ^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_core/retrievers.py", line 193, in invoke return self.get_relevant_documents( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_core/retrievers.py", line 321, in get_relevant_documents raise e File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_core/retrievers.py", line 314, in get_relevant_documents result = self._get_relevant_documents( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 696, in _get_relevant_documents docs = self.vectorstore.similarity_search(query, **self.search_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_chroma/vectorstores.py", line 379, in similarity_search docs_and_scores = self.similarity_search_with_score( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_chroma/vectorstores.py", line 468, in similarity_search_with_score query_embedding = self._embedding_function.embed_query(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain/embeddings/cache.py", line 191, in embed_query return self.underlying_embeddings.embed_query(text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_community/embeddings/bedrock.py", line 187, in embed_query embedding = self._embedding_func(text) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/helio/.cache/pypoetry/virtualenvs/-bot-M2hQc3u3-py3.11/lib/python3.11/site-packages/langchain_community/embeddings/bedrock.py", line 118, in _embedding_func text = text.replace(os.linesep, " ") ^^^^^^^^^^^^ AttributeError: 'dict' object has no attribute 'replace'

hinthornw commented 4 months ago

Ah i see. THe rag_chain expects a string , but the input to the evaluated chain is a dictionary.

To fix, check the input key for your dataset, and then update the call:

def predict(inputs: dict):
    return rag_chain.invoke(inputs["MY-KEY"])

client.run_on_dataset(
    llm_or_chain_factory=predict,
...
)

Though this is unrelated to this ticket, we are also recommending usage of the newer evaluate() method. It's a cleaner API that supports si milar functionality (doc).

I made a prompt to attempt to upgrade your code https://smith.langchain.com/hub/wfh/run2evaluate

Can't guarantee 100% accuracy on the conversion though!

helioLJ commented 4 months ago

Oh my gosh, it worked! It was just that: image

def predict(inputs: dict):
    return rag_chain.invoke(inputs["question"])

Also thank you for recommending the new API, I will try to migrate to this new way!

Closing it here, hope this can help other people.