langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications

https://python.langchain.com

MIT License

91.32k stars 14.53k forks source link

how to save plots/graphs when using PythonREPLTool in langchain experimental #16067

Closed jyoti194 closed 3 weeks ago

jyoti194 commented 7 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.

Example Code

db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, return_direct=True)

tools = [PythonREPLTool()]

description = (
"Useful for when you need to answer questions about employess. "
"You must not input SQL. Use this more than the Python tool if the question "
"is about employee data, like 'how many employees are there?' or 'count the number of department'"
)

repl_tool = Tool(
    name="Data",  # We'll just call it 'Data'
    func=db_chain.run,
    description=description,
)

tools.append(repl_tool)

prefix = "Fulfill the following request as best you can. You have access to the following tools:"

Remind the agent of the Data tool, and what types of input it expects

suffix = (
    "Begin! When looking for data, do not write a SQL query. "
    "Pass the relevant portion of the request directly to the Data tool in its entirety."
    "\n\n"
    "Request: {input}\n"
    "{agent_scratchpad}"
)

# The agent's prompt is built with the list of tools, prefix, suffix, and input variables
prompt = ZeroShotAgent.create_prompt(
    tools, prefix=prefix, suffix=suffix, input_variables=["input", "agent_scratchpad"]
)

# Set up the llm_chain
llm_chain = LLMChain(llm=llm, prompt=prompt)

# Specify the tools the agent may use
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)

# Create the AgentExecutor
agent_executor = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

answer = agent_executor.run(question)

print(answer)

Description

using langchain experimental, i'm trying to interact with sql db, where i should also be able to plot the graph using natural language. im using the following above method using python repl tool, its displaying the graph but how to save the graph, like in create_pandas_dataframe_agent in langchain agent where we have a save_chatrs=True function.

and also, python repl tool in python or pandas ai which one is the best method to generate graphs/plots in natural language

System Info

python: 3.11 langchain: latest version

Related Components

[X] LLMs/Chat Models
[ ] Embedding Models
[X] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[X] Tools / Toolkits
[X] Chains
[ ] Callbacks/Tracing
[ ] Async

dosubot[bot] commented 7 months ago

🤖

Hey @jyoti194, glad to see you back! Let's dive into this new challenge you've presented, shall we? 🚀

Based on the information available in the LangChain repository, the PythonREPLTool does not have a built-in functionality to save plots or graphs generated during its execution. However, there is a method in the NetworkxEntityGraph class called draw_graphviz that can be used to save the graph to a file. This method uses the pygraphviz package to draw the graph and save it to a file.

Here is the relevant code:

def draw_graphviz(self, **kwargs: Any) -> None:
    """
    Provides better drawing

    Usage in a jupyter notebook:

        >>> from IPython.display import SVG
        >>> self.draw_graphviz_svg(layout="dot", filename="web.svg")
        >>> SVG('web.svg')
    """
    from networkx.drawing.nx_agraph import to_agraph

    try:
        import pygraphviz  # noqa: F401

    except ImportError as e:
        if e.name == "_graphviz":
            """
            >>> e.msg  # pygraphviz throws this error
            ImportError: libcgraph.so.6: cannot open shared object file
            """
            raise ImportError(
                "Could not import graphviz debian package. "
                "Please install it with:"
                "`sudo apt-get update`"
                "`sudo apt-get install graphviz graphviz-dev`"
            )
        else:
            raise ImportError(
                "Could not import pygraphviz python package. "
                "Please install it with:"
                "`pip install pygraphviz`."
            )

    graph = to_agraph(self._graph)  # --> pygraphviz.agraph.AGraph
    # pygraphviz.github.io/documentation/stable/tutorial.html#layout-and-drawing
    graph.layout(prog=kwargs.get("prog", "dot"))
    graph.draw(kwargs.get("path", "graph.svg"))

In this method, the kwargs.get("path", "graph.svg") line is used to specify the file path where the graph will be saved. If no path is provided, it will default to "graph.svg". You can find this method in the libs/langchain/langchain/graphs/networkx_graph.py file.

You can integrate this method into the PythonREPLTool to save the generated plots or graphs. Here is an example of how you can use this method:

from libs.langchain.langchain.graphs.networkx_graph import NetworkxEntityGraph, KnowledgeTriple

# Initialize the NetworkxEntityGraph
graph = NetworkxEntityGraph()

# Add some triples to the graph
graph.add_triple(KnowledgeTriple('subject1', 'predicate1', 'object1'))
graph.add_triple(KnowledgeTriple('subject2', 'predicate2', 'object2'))

# Draw the graph and save it as an SVG file
graph.draw_graphviz(prog='dot', path='my_graph.svg')

In this example, prog='dot' specifies the layout algorithm used by Graphviz and path='my_graph.svg' specifies the filename of the output SVG file. You can replace these with your desired layout and filename.

As for your question about whether PythonREPLTool in Python or pandas ai is the best method to generate graphs/plots in natural language, it depends on your specific use case. Both tools have their strengths and can be used effectively for different tasks. If you could provide more details about your specific requirements, I might be able to give a more precise recommendation.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

libs/langchain/langchain/graphs/networkx_graph.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jyoti194 commented 7 months ago

Hi @dosubot, can u explain it in more detail and in a easy way, and how to moidfy in the code

dosubot[bot] commented 7 months ago

🤖

Based on the current implementation of the PythonREPLTool in LangChain, it doesn't directly support saving plots or graphs generated by commands. The tool is designed to execute Python commands and return the output as a string, which works well for text-based outputs but not for graphical outputs like plots or graphs.

However, you can modify the PythonREPLTool to save plots or graphs by using matplotlib's savefig function. Here's an example of how you can modify your code to save a plot:

import matplotlib.pyplot as plt

# Your code to generate a plot
plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')

# Save the plot to a file
plt.savefig('my_plot.png')

In this example, the plot is saved to a file named 'my_plot.png'. You can specify a different file name or path as needed.

As for your question about whether PythonREPLTool or pandas AI is the best method to generate graphs/plots in natural language, it depends on your specific use case. PythonREPLTool is a more general tool that can execute any Python command, while pandas AI is specifically designed for data analysis and manipulation. If you're working with data in a DataFrame and want to generate plots from that data, pandas AI might be a better choice. However, if you need to execute a wider range of Python commands, PythonREPLTool might be more suitable.

I hope this helps! If you have any further questions, feel free to ask.

Sources

libs/langchain/langchain/utilities/python.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jyoti194 commented 7 months ago

Hi @dosubot, i'm using matplotlib to save the graphs, its saving the file name but there is nothing stored inside the .png file

here is my code def chat4(question): db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, return_direct=True)

tools = [PythonREPLTool()]

description = (
"Useful for when you need to answer questions about employess. "
"You must not input SQL. Use this more than the Python tool if the question "
"is about employee data, like 'how many employees are there?' or 'count the number of department'"
)

repl_tool = Tool(
    name="Data",  # We'll just call it 'Data'
    func=db_chain.run,
    description=description,
)

tools.append(repl_tool)

prefix = "Fulfill the following request as best you can. You have access to the following tools:"

Remind the agent of the Data tool, and what types of input it expects

suffix = (
    "Begin! When looking for data, do not write a SQL query. "
    "Pass the relevant portion of the request directly to the Data tool in its entirety."
    "\n\n"
    "Request: {input}\n"
    "{agent_scratchpad}"
)

# The agent's prompt is built with the list of tools, prefix, suffix, and input variables
prompt = ZeroShotAgent.create_prompt(
    tools, prefix=prefix, suffix=suffix, input_variables=["input", "agent_scratchpad"]
)

# Set up the llm_chain
llm_chain = LLMChain(llm=llm, prompt=prompt)

# Specify the tools the agent may use
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)

# Create the AgentExecutor
agent_executor = AgentExecutor.from_agent_and_tools(
    agent=agent, save_charts=True, tools=tools, verbose=True, handle_parsing_errors=True, 
)

# answer = agent_executor.run(PROMPT.format(question=question))
answer = agent_executor.run(question)

# plot_filename = f"chat_data_{pd.Timestamp.now().strftime('%Y-%m-%d_%H-%M-%S')}.png"
# plt.savefig(plot_filename, format='png')
plt.savefig('my_plot.png')
answer = f"Graph saved as {'my_plot.png'}"

return answer

dosubot[bot] commented 7 months ago

🤖

Based on your code, it seems like you're trying to save the plot after the agent_executor.run(question) line. However, this might not work as expected because the run method doesn't necessarily return a matplotlib plot that can be saved using plt.savefig().

Instead, you can modify the PythonREPLTool class to save the plot directly within the tool. Here's how you can do it:

import matplotlib.pyplot as plt

class PythonREPLTool(BaseTool):
    # ...

    def _run(
        self,
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> Any:
        """Use the tool."""
        if self.sanitize_input:
            query = sanitize_input(query)
        result = self.python_repl.run(query)

        # Check if the result is a matplotlib figure
        if isinstance(result, plt.Figure):
            # Save the figure to a file
            result.savefig('output.png')

        return result

In this modification, after running the query, we check if the result is a matplotlib figure. If it is, we save the figure to a file named 'output.png'.

Please note that this modification assumes that the matplotlib plots are returned as a result of the run method of the PythonREPL class. If the plots are not returned but are displayed using plt.show(), you would need to modify the PythonREPL class to return the plots instead of displaying them.

Also, this modification will overwrite the 'output.png' file every time a new plot is generated. If you want to save each plot to a different file, you could generate a unique filename for each plot.

Regarding your question about the best method to generate graphs/plots in natural language, both PythonREPLTool and PandasAI can be used to generate graphs/plots. The choice between the two depends on your specific use case. If you're working with dataframes and need advanced data manipulation capabilities, PandasAI might be a better choice. On the other hand, if you're executing arbitrary Python code and need a more general-purpose tool, PythonREPLTool might be more suitable.

I hope this helps! If you have any more questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jyoti194 commented 7 months ago

@dosubot, this is the code class PythonREPLTool(BaseTool): """A tool for running python code in a REPL."""

name: str = "Python_REPL"
description: str = (
    "A Python shell. Use this to execute python commands. "
    "Input should be a valid python command. "
    "If you want to see the output of a value, you should print it out "
    "with `print(...)`."
)
python_repl: PythonREPL = Field(default_factory=_get_default_python_repl)
sanitize_input: bool = True

def _run(
    self,
    query: str,
    run_manager: Optional[CallbackManagerForToolRun] = None,
) -> Any:
    """Use the tool."""
    if self.sanitize_input:
        query = sanitize_input(query)
    return self.python_repl.run(query)

async def _arun(
    self,
    query: str,
    run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> Any:
    """Use the tool asynchronously."""
    if self.sanitize_input:
        query = sanitize_input(query)

    loop = asyncio.get_running_loop()
    result = await loop.run_in_executor(None, self.run, query)

    return result

in langchain_experimental.tools.python.tool

do i have to modify it here itself?

dosubot[bot] commented 7 months ago

🤖

Based on the code you've provided, it seems like you're on the right track. However, there are a few things to consider:

The savefig function should be called right after the plot is generated. In your case, it seems like you're calling savefig after the execution of the agent, which might be why the saved file is empty. The plot might have been cleared or closed by the time savefig is called.
The PythonREPLTool class you've shown is used to execute Python code in a REPL. It doesn't inherently know when a plot is generated. Therefore, you would need to modify the _run method to check if a plot was generated during the execution of the query and save it if necessary.

Here's an example of how you might modify the _run method:

import matplotlib.pyplot as plt

class PythonREPLTool(BaseTool):
    # ...

    def _run(
        self,
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> Any:
        """Use the tool."""
        if self.sanitize_input:
            query = sanitize_input(query)
        result = self.python_repl.run(query)

        # Check if a plot was generated and save it
        if plt.get_fignums():
            plt.savefig('my_plot.png')

        return result

In this example, plt.get_fignums() is used to check if a plot was generated. If a plot was generated, it is saved to a file named 'my_plot.png'.

Please note that this is a simple example and may not cover all cases. For example, it assumes that you want to save all plots to the same file, which may not be what you want. You may need to modify this code to suit your specific needs.

As for your question about whether the Python REPL tool in Python or pandas AI is the best method to generate graphs/plots in natural language, it really depends on your specific use case. Both tools have their own strengths and weaknesses, and the best tool for you would depend on your specific requirements and constraints.

Sources

libs/experimental/langchain_experimental/tools/python/tool.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jyoti194 commented 7 months ago

@dosubot even after modifying the class PythonREPLTool(BaseTool): for run method still its not saving the graphs

here is the output of the answer

Entering new AgentExecutor chain... Question: How many male employees are there? Thought: I need to count the number of male employees. Action: Data Action Input: SELECT COUNT(*) FROM employees WHERE gender = 'M'

Entering new SQLDatabaseChain chain... SQLQuery SELECT COUNT() FROM employees WHERE gender = 'M' SQLQuery:SELECT COUNT() FROM [EGV_emp_departments_ChatGPT] WHERE [Gender] = 'M'SELECT COUNT(*) FROM [EGV_emp_departments_ChatGPT] WHERE [Gender] = 'M' [(1334,)]

SQLResult: [(1334,)]----{'result': '[(1334,)]'}-----

Finished chain.

Observation: [(1334,)] Thought:I now know the number of male employees. Question: How many female employees are there? Thought: I need to count the number of female employees. Action: Data Action Input: SELECT COUNT(*) FROM employees WHERE gender = 'F'

Entering new SQLDatabaseChain chain... SQLQuery SELECT COUNT() FROM employees WHERE gender = 'F' SQLQuery:SELECT COUNT() FROM [EGV_emp_departments_ChatGPT] WHERE [Gender] = 'F'SELECT COUNT(*) FROM [EGV_emp_departments_ChatGPT] WHERE [Gender] = 'F' [(862,)]

SQLResult: [(862,)]----{'result': '[(862,)]'}-----

Finished chain.

Observation: [(862,)] Thought:I now know the number of female employees. Question: What is the total number of employees? Thought: I need to add the number of male and female employees together. Action: Python_REPL Action Input: print(1334 + 862)Python REPL can execute arbitrary code. Use with caution.

Observation: 2196

Thought:I now know the total number of employees. Question: What is the percentage of male employees? Thought: I need to calculate the percentage of male employees. Action: Python_REPL Action Input: print((1334 / 2196) * 100) Observation: 60.74681238615665

Thought:I now know the percentage of male employees. Question: What is the percentage of female employees? Thought: I need to calculate the percentage of female employees. Action: Python_REPL Action Input: print((862 / 2196) * 100) Observation: 39.25318761384335

Thought:I now know the percentage of female employees. Final Answer: The percentage of male employees is approximately 60.75% and the percentage of female employees is approximately 39.25%.

Finished chain. The percentage of male employees is approximately 60.75% and the percentage of female employees is approximately 39.25%.

for same code while running the cell, it would give

Entering new AgentExecutor chain... Question: What is the percentage of male and female ratio? Thought: To visualize the percentage of male and female ratio, we can use a pie graph. Action: Python_REPL Action Input: None Observation: Thought:To create a pie graph, we need to have the data on the male and female ratio. Action: Data Action Input: SELECT COUNT(*) FROM employees WHERE gender = 'Male'

Entering new SQLDatabaseChain chain... SQLQuery SELECT COUNT() FROM employees WHERE gender = 'Male' SQLQuery:SELECT COUNT() FROM [EGV_emp_departments_ChatGPT] WHERE [Gender] = 'M'SELECT COUNT(*) FROM [EGV_emp_departments_ChatGPT] WHERE [Gender] = 'M' [(1334,)]

SQLResult: [(1334,)]----{'result': '[(1334,)]'}-----

Finished chain.

Observation: [(1334,)] Thought:We have the count of male employees. Now we need the count of female employees. Action: Data ... fig1, ax1 = plt.subplots() ax1.pie(sizes, colors = colors, labels=labels, autopct='%1.1f%%', startangle=90) ax1.set_title('Percentage of Male and Female Employees') ax1.axis('equal') plt.show() Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

Observation: Thought:We have created the pie graph visualizing the percentage of male and female employees. Final Answer: The pie graph shows that approximately 60.7% of employees are male and 39.3% are female.

Finished chain. 'The pie graph shows that approximately 60.7% of employees are male and 39.3% are female.'

dosubot[bot] commented 7 months ago

🤖

Based on the code you've provided, it seems like you're trying to save the plot right after running the agent_executor. However, the PythonREPLTool doesn't inherently support plotting, and it doesn't seem like your code is generating any plots within the PythonREPLTool that could be saved.

The PythonREPLTool is designed to execute Python code, but it doesn't handle graphical output like plots. If you're generating plots within the PythonREPLTool, you would need to modify the tool to handle this graphical output.

Here's a modification of the _run method in the PythonREPLTool class that saves a plot if one was generated during the execution of the query:

import matplotlib.pyplot as plt

class PythonREPLTool(BaseTool):
    # ... existing code ...

    def _run(
        self,
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> Any:
        """Use the tool."""
        if self.sanitize_input:
            query = sanitize_input(query)
        result = self.python_repl.run(query)

        # Save the plot if one was generated
        if plt.get_fignums():
            plt.savefig('plot.png')

        return result

This modification checks if any figures were created during the execution of the query. If so, it saves the current figure to a file named 'plot.png'. You can change the filename to suit your needs.

Please note that this modification assumes that matplotlib is being used to generate the plots or graphs. If a different library is being used, you would need to use the appropriate method to save the plot or graph.

As for your question about whether the Python REPL tool in Python or pandas AI is the best method to generate graphs/plots in natural language, it really depends on your specific use case. Both tools have their strengths and weaknesses, and the best choice would depend on the specific requirements of your project.

For example, if you're working with data that's already in a pandas DataFrame and you want to leverage pandas' powerful data manipulation capabilities, then using pandas AI might be a good choice. On the other hand, if you're working with data that's not in a DataFrame and you want more flexibility in how you manipulate and visualize your data, then using the Python REPL tool might be a better choice.

I hope this helps! If you have any other questions, feel free to ask.

Sources

libs/experimental/langchain_experimental/tools/python/tool.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

mlenzovet commented 4 months ago

@dosubot I have df1 and df2. I would like to get df1_df2_merged pandas dataframe from AgentExecutor