langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.05k stars 14.96k forks source link

Document Comparison toolkit is not working #9981

Closed abgonzalez closed 3 months ago

abgonzalez commented 1 year ago

System Info

First, Thank you so much for your work on Langchain, it's very good.

I am trying to compare two documents following the guide from langchain https://python.langchain.com/docs/integrations/toolkits/document_comparison_toolkit

I have done exactly the same code: I have one class to use for the args_schema on the tools creation:

class DocumentInput(BaseModel):
    question: str = Field()

I have created the tools :

 tools.append(
                    Tool(
                        args_schema=DocumentInput,
                        name=file_name,
                        description=f"useful when you want to answer questions about {file_name}",
                        func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
                    )
                )
agent = initialize_agent(
        agent=AgentType.OPENAI_FUNCTIONS,
        tools=tools,
        llm=llm,
        verbose=True,
    )

And here I am getting the error:

"1 validation error for Tool\nargs_schema\n subclass of BaseModel expected (type=type_error.subclass; expected_class=BaseModel)",

I have changed the args_schema class to :

from abc import ABC
from langchain.tools import BaseTool
from pydantic import Field

class DocumentInput(BaseTool, ABC):
    question: str = Field()

And now the error, I am getting is:

("Value not declarable with JSON Schema, field: name='_callbacks_List[langchain.callbacks.base.BaseCallbackHandler]' type=BaseCallbackHandler required=True",)

I only want to compare the content between two documents, Do you have some example to compare two files which is working? Maybe I am calling wrong the creation of the tools.

Who can help?

@yajunDai

Information

Related Components

Reproduction

I have follow exactly the guide for ->https://python.langchain.com/docs/integrations/toolkits/document_comparison_toolkit to compare two documents, and I am getting the error :

"1 validation error for Tool\nargs_schema\n subclass of BaseModel expected (type=type_error.subclass; expected_class=BaseModel)",

The args_schema class is :

class DocumentInput(BaseModel):
    question: str = Field()

I trying to create the tools :

 tools.append(
                    Tool(
                        args_schema=DocumentInput,
                        name=file_name,
                        description=f"useful when you want to answer questions about {file_name}",
                        func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
                    )
                )

and here I am getting the error: "1 validation error for Tool\nargs_schema\n subclass of BaseModel expected (type=type_error.subclass; expected_class=BaseModel)",

After that I am doing this, but I can't get to initialize_agent

 agent = initialize_agent(
        agent=AgentType.OPENAI_FUNCTIONS,
        tools=tools,
        llm=llm,
        verbose=True,
    )

It's exactly the guide for the Document comparsion on Langchain

Expected behavior

Expected behaviour, that I can compare the content for two documents without error

dosubot[bot] commented 1 year ago

🤖

Hello,

Thank you for your detailed explanation of the issue you're facing. It seems like you're having trouble with the args_schema parameter in the Tool class.

The args_schema parameter is expected to be a Pydantic model class that is used to validate and parse the tool's input arguments. In your case, you've defined DocumentInput as a subclass of BaseModel which is correct.

class DocumentInput(BaseModel):
    question: str = Field()

However, the error message you're seeing suggests that the Tool class is not recognizing DocumentInput as a subclass of BaseModel. This could be due to a variety of reasons, such as an import error or a naming conflict.

As for the second error you're seeing, it seems like there's an issue with the BaseCallbackHandler class. Without more context, it's hard to say what the exact issue is. However, it's possible that there's a problem with the way the BaseCallbackHandler class is being serialized to JSON.

Unfortunately, without more information, it's hard to provide a more specific solution. Could you please provide more details about your environment? Specifically, it would be helpful to know:

This information will help us better understand the issue and provide a more accurate solution.

Thank you for your patience and cooperation.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

abgonzalez commented 1 year ago

Thank you so much for your answer,

The version of LangChain I am using -> 0.0.275 The version of Python -> 3.8 Any other relevant packages or libraries you're using And the version fo the rest of the packages are: Package Version


aiohttp 3.8.5 aiosignal 1.3.1 annotated-types 0.5.0 anyio 3.7.1 async-timeout 4.0.3 attrs 23.1.0 basf-auth 2.9.0 cachetools 4.2.4 certifi 2023.7.22 cffi 1.15.1 charset-normalizer 3.2.0 click 8.1.7 colorama 0.4.6 cryptography 41.0.3 dataclasses-json 0.5.14 exceptiongroup 1.1.3 faiss-cpu 1.7.4 fastapi 0.103.0 frozenlist 1.4.0 greenlet 2.0.2 gunicorn 21.2.0 h11 0.14.0 httpcore 0.17.3 httptools 0.6.0 httpx 0.24.1 idna 3.4 itsdangerous 2.1.2 langchain 0.0.275 langsmith 0.0.27 loguru 0.7.0 marshmallow 3.20.1 multidict 6.0.4 mypy-extensions 1.0.0 nose2 0.13.0 numexpr 2.8.5 numpy 1.24.4 openai 0.27.9 websockets 11.0.3 wheel 0.38.4 win32-setctime 1.1.0 yarl 1.9.2

I am using AzureOpenAI model=gpt-35-turbo-16k version=2023-07-01-preview

I can upload the code in Github,if it can help

thanks

simjak commented 1 year ago

same issue for me: pydantic.v1.error_wrappers.ValidationError: 1 validation error for Tool args_schema subclass of BaseModel expected (type=type_error.subclass; expected_class=BaseModel)

Hexa75 commented 1 year ago

same issue with me python 3.10 pydantic 2.3.0 pydantic-core 2.6.3 langchain 0.0.281

abgonzalez commented 1 year ago

Hello,

I am still having the same problem, Can anyone get some news about this problem? Have you got to compare two documents?

Thanks

mauricio-fernandez-l commented 1 year ago

I also had the problem until today. For me the problem was the new pydantic==2.4.2. I downgraded to pydantic==1.10.12 and it works for me with langchain==0.0.304.

mdfahad999 commented 1 year ago

Hello,

I am still having the same problem, Can anyone get some news about this problem? Have you got to compare two documents?

Thanks

I was also facing the same issue that got resolved by downgrading pydantic to pydantic==1.10.0 and using langchain==0.0.287 version. It got sorted

stanyq4 commented 11 months ago

Still having the same issue with the latest langchain

JieShenAI commented 11 months ago

!pip install pydantic==1.10.13 I meet the same problem, and I solve it by run the above code.

By the way, my previous pydantic version is 2.3.0 which is not ok.

TingHui21 commented 11 months ago

Facing same issue, would like to stick to pydantic version 2.4.2

DougHaber commented 9 months ago

Encountering the same issue here. The examples in the documentation don't work. If you start from a clean virtualenv, install langchain, and then run code from the documentation, it fails:

https://python.langchain.com/docs/modules/agents/tools/custom_tools

class SearchInput(BaseModel):
    query: str = Field(description="should be a search query")

@tool("search", return_direct=True, args_schema=SearchInput)
def search_api(query: str) -> str:
    """Searches the API for the query."""
    return "Results"

throws the error:

pydantic.v1.error_wrappers.ValidationError: 1 validation error for StructuredTool                                       
args_schema                                                                                                             
  subclass of BaseModel expected (type=type_error.subclass; expected_class=BaseModel)

Reverting Pydantic back to a 1.x version as mentioned above fixes it:

pip install pydantic==1.10.13

I'm not sure if there are any downsides, but another solution for now might be to let it install the newer Pydantic, but use the v1 BaseModel.

from pydantic.v1 import BaseModel, Field                                                                                
usmanyaqoob49 commented 9 months ago

Pydantic is the main issue: Pydantic has released v2 version on June 30, 2023 and langchain integration is not compatible

GilbertoAbrao commented 8 months ago

Same issue: '1 validation error for Tool\nargs_schema\n subclass of BaseModel expected (type=type_error.subclass; expected_class=BaseModel)'

python: 3.10.5 langchain: 0.0.340 pydantic: 1.10.13

####### schema_code #######

from langchain.tools import BaseTool
from langchain.tools.base import ToolException
from pydantic import BaseModel, Field 
from typing import Type, Optional
from requests import post
import logging

class AddLeadHubSpotSchema(BaseModel):
    lead_name: Optional[str] = Field(description="should be a string with full name of the lead")
    lead_email: Optional[str] = Field(description="should be a string with email of the lead")
    lead_mobile_phone: Optional[str] = Field(description="should be a string should be a string with mobile phone of the lead")
    hubspot_api_url: Optional[str] = Field(description="should be a string")
    hubspot_api_access_key: Optional[str] = Field(description="should be a string")
    hubspot_api_secret_key: Optional[str] = Field(description="should be a string")

####### instantiation_code #######

try:

    from langchain.agents import Tool

    args_schema = AddLeadHubSpotSchema(
        hubspot_api_url='https://url-passada-via-args-schema.com',
        hubspot_api_access_key='access-key-passado-via-args-schema',
        hubspot_api_secret_key='secret-key-passada-via-args-schema',
    )

    add_lead_hubspot_tool = AddLeadHubSpotTool()

    tool = Tool(
                name=add_lead_hubspot_tool.name,
                func=add_lead_hubspot_tool.run,
                description=add_lead_hubspot_tool.description,
                return_direct=add_lead_hubspot_tool.return_direct,
                handle_tool_error=True,
                args_schema=args_schema,
            )

    return tool 
iwahith commented 7 months ago

Hi everyone! Need some help in Langchain Documnet Comparsion (https://python.langchain.com/docs/integrations/toolkits/document_comparison_toolkit). The agent is responsding to general question, like how to make a pizza. How to restrict the agent to respond out of my vector store. Tried to add the prompt template for rag from langchain hub. Not working. Is there any other way to restrict. Thanks in advance.

tools = [] files = [

https://abc.xyz/investor/static/pdf/2023Q1_alphabet_earnings_release.pdf

{
    "name": "alphabet-earnings",
    "path": "/Users/harrisonchase/Downloads/2023Q1_alphabet_earnings_release.pdf",
},
# https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2023-Update
{
    "name": "tesla-earnings",
    "path": "/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf",
},

]

for file in files: loader = PyPDFLoader(file["path"]) pages = loader.load_and_split() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(pages) embeddings = OpenAIEmbeddings() retriever = FAISS.from_documents(docs, embeddings).as_retriever()

# Wrap retrievers in a Tool
tools.append(
    Tool(
        args_schema=DocumentInput,
        name=file["name"],
        description=f"useful when you want to answer questions about {file['name']}",
        func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever),
    )
)

llm = ChatOpenAI( temperature=0, model="gpt-3.5-turbo-0613", )

agent = initialize_agent( agent=AgentType.OPENAI_FUNCTIONS, tools=tools, llm=llm, verbose=True, )

agent({"input": "did alphabet or tesla have more revenue?"})

jithinkk commented 6 months ago

Pydantic new version has compatibility issues with langchain with v2 version released June 30, 2023. simplest way is to change the pydantic import as below

''' from pydantic.v1 import BaseModel '''

reference: stackoverflow