langchain-ai / langchain

πŸ¦œπŸ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.02k stars 14.94k forks source link

`PythonREPL` removal from langchain library #14345

Closed Vegoo89 closed 1 month ago

Vegoo89 commented 10 months ago

System Info

Langchain version: 0.0.346 Python version: 3.9.16

Who can help?

No response

Information

Related Components

Reproduction

PythonREPL which has been moved to experimental still exists in based library under path:

libs/langchain/langchain/utilities/python.py

which triggers security scans vulnerabilities (exec() call) and doesn't allow us to use the package on the production environment.

Since

https://nvd.nist.gov/vuln/detail/CVE-2023-39631

Should be most likely closed soon, this is only vulnerability that would have to be addressed so we can freely use langchain.

Expected behavior

PythonREPL should only exist in experimental version of langchain

dosubot[bot] commented 10 months ago

πŸ€–

Hello @Vegoo89,

I'm Dosu, an automated assistant here to help you with your issues, answer questions, and guide you in becoming a contributor while we wait for a human maintainer. I'm here to make your experience with LangChain as smooth as possible. Let's get started with your issue!

Thank you for bringing this to our attention. The PythonREPL component, which currently exists in the base library, can indeed be moved exclusively to the experimental version of langchain to address security concerns. This can be done by moving the entire PythonREPL class definition to the appropriate file in the experimental version of the library.

Here's how you can do it:

  1. Cut the entire PythonREPL class from the libs/langchain/langchain/utilities/python.py file.
class PythonREPL(BaseModel):
    """Simulates a standalone Python REPL."""

    globals: Optional[Dict] = Field(default_factory=dict, alias="_globals")
    locals: Optional[Dict] = Field(default_factory=dict, alias="_locals")

    @classmethod
    def worker(
        cls,
        command: str,
        globals: Optional[Dict],
        locals: Optional[Dict],
        queue: multiprocessing.Queue,
    ) -> None:
        old_stdout = sys.stdout
        sys.stdout = mystdout = StringIO()
        try:
            exec(command, globals, locals)
            sys.stdout = old_stdout
            queue.put(mystdout.getvalue())
        except Exception as e:
            sys.stdout = old_stdout
            queue.put(repr(e))

    def run(self, command: str, timeout: Optional[int] = None) -> str:
        """Run command with own globals/locals and returns anything printed.
        Timeout after the specified number of seconds."""

        # Warn against dangers of PythonREPL
        warn_once()

        queue: multiprocessing.Queue = multiprocessing.Queue()

        # Only use multiprocessing if we are enforcing a timeout
        if timeout is not None:
            # create a Process
            p = multiprocessing.Process(
                target=self.worker, args=(command, self.globals, self.locals, queue)
            )

            # start it
            p.start()

            # wait for the process to finish or kill it after timeout seconds
            p.join(timeout)

            if p.is_alive():
                p.terminate()
                return "Execution timed out"
        else:
            self.worker(command, self.globals, self.locals, queue)
        # get the result from the worker function
        return queue.get()
  1. Paste the PythonREPL class into the appropriate file in the experimental version of the library.

Remember to also move any dependencies that PythonREPL has, such as the warn_once function and any imported modules, to ensure that the class continues to function correctly.

This issue has been addressed before in the LangChain repository. You can refer to the following issues and pull requests for more information:

Please let me know if you need further assistance.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Vegoo89 commented 9 months ago

Hello, is there any update on this issue? @eyurtsev I am mentioning you since you auto-assigned it.

m4tland commented 8 months ago

+1 we can not use langchain in production because of this. The code moved here but still an issue: langchain/libs/community/langchain_community/utilities/python.py

Issue: sonatype-2023-3640 Weakness: Sonatype CWE: 77

eyurtsev commented 4 months ago

Looks like a blanket security policy that's just flagging presence of exec. You can discuss with your security team to make an exception in the meantime -- i.e., code works as expected it's a python REPL.

I'll try to move to experimental to unblock folks, but this will likely cause breaks in other users code that have taken the effort to run the code from a sandboxed environment.

Vegoo89 commented 4 months ago

Looks like a blanket security policy that's just flagging presence of exec.

This is exactly what is happening in scanning tools. I work in the financial institution and I can get an exception for single artifact version after long fight with security team. After its approved, there are already few, newer versions added to mirror that are quarantined and cycle continues.