Closed 0xfatty closed 3 months ago
Hey @0xfatty, I'm here to help you with any bugs, questions, or contributions you have in mind. Let's get started!
To address the security vulnerability in the Custom Component feature of the Langflow framework that allows for arbitrary code execution when hosted online, you should implement input validation and sanitization. Additionally, you can leverage the existing structure of the Component
class to ensure that the code provided is safe and does not contain malicious elements.
Here is an example of how you can modify the build
method to include input validation and sanitization, while also utilizing the existing Component
class structure:
import re
from langflow.custom import CustomComponent
from langflow.schema import Record
from langflow.custom.eval import eval_custom_component_code
from langflow.utils import validate
class Component(CustomComponent):
display_name = "Custom Component"
description = "Use as a template to create your own component."
documentation: str = "http://docs.langflow.org/components/custom"
icon = "custom_components"
def build_config(self):
return {"param": {"display_name": "Parameter"}}
def build(self, param: str) -> Record:
# Basic input validation to prevent arbitrary code execution
if not isinstance(param, str):
raise ValueError("Parameter must be a string")
# Example of sanitizing input to remove potentially dangerous characters
sanitized_param = re.sub(r'[^\w\s]', '', param)
# Validate and execute the custom component code safely
if not self.code:
raise ComponentCodeNullError(
status_code=400,
detail={"error": self.ERROR_CODE_NULL, "traceback": ""},
)
if not self._function_entrypoint_name:
raise ComponentFunctionEntrypointNameNullError(
status_code=400,
detail={
"error": self.ERROR_FUNCTION_ENTRYPOINT_NAME_NULL,
"traceback": "",
},
)
custom_function = validate.create_function(self.code, self._function_entrypoint_name)
result = custom_function(sanitized_param)
return Record(data=result)
In this example:
build
method checks if the param
is a string.Component
class methods.This approach helps mitigate the risk of arbitrary code execution by ensuring that the input is safe before processing it and by using the existing validation mechanisms provided by the Langflow framework [1][2].
To continue the conversation, mention @dosu.
Thank you for your prompt response and suggestion to address the security vulnerability by implementing input validation and sanitization. However, I'd like to clarify that the core issue lies in the context of the entire Langflow framework being hosted publicly, rather than the safety of individual published components.
The vulnerability arises because the framework allows users to create and execute their custom components on a publicly accessible server. In this scenario, the risk is that a malicious user can upload a custom component containing arbitrary and potentially harmful code, which gets executed on the server.
Taking the online playground provided by Langflow at https://huggingface.co/spaces/Langflow/Langflow as an example. With the same attack vector, I was able to execute the OS commands against the underlying docker container.
Sample output
Linux r-langflow-langflow-6hjjdvt7-50f38-0g46r 5.10.205-195.807.amzn2.x86_64 #1 SMP Tue Jan 16 18:28:59 UTC 2024 x86_64 GNU/Linux
total 88
drwxr-xr-x 1 user user 18 May 17 20:40 .
drwxr-xr-x 1 user user 62 May 25 12:46 ..
drwxr-xr-x 8 user user 178 Apr 12 11:49 .git
-rw-r--r-- 1 user user 1477 Apr 12 11:49 .gitattributes
-rw-r--r-- 1 user user 73828 Apr 12 11:51 =0.6.12
-rw-r--r-- 1 user user 556 Apr 12 11:49 Dockerfile
-rw-r--r-- 1 user user 1209 Apr 12 11:49 README.md
drwxr-xr-x 2 user user 26 May 17 20:40 logs
We also anticipated encountering similar issues when implementing Whisper in Langflow, specifically when converting voice file inputs to Base64 within the component.
As you mentioned, if a malicious user gains access to the Langflow exposed to clients, they could potentially include custom components with such backdoor code. (Similar to the Hugging Face scenario you mentioned)
Given the recent pull request history showing an image output component, it seems likely that an input component field will be added soon. It would be ideal if file input and output handling were restricted solely to the input component level, preventing users from directly writing these operations in custom components. (Alternatively, Langflow could include a mechanism to directly validate the Base64 format.)
Additionally, I recently heard that the LF core team is refactoring using Casbin for RBAC. Fundamentally, beyond debug/build modes, Casbin's introduction will likely result in account-based API call control. This aspect should be emphasized in the manual to caution users accordingly. (At the product level, the example scenario you mentioned should be used to highlight the importance of disabling the AUTO LOGIN flag.)
Similarly, https://github.com/langflow-ai/langflow/pull/1747 I think the approach of calling exec-like code, in addition to embedding such backdoors in the flow, is also risky. What are your thoughts on this?
Thank you for your detailed response and for acknowledging the potential risks associated with the Custom Component feature
. I appreciate the proactive measures you are considering, particularly the introduction of file input and output handling restrictions and the refactoring with Casbin for RBAC. Here are my thoughts on the points you raised:
Restricting file input and output operations to specific input components is a great step towards mitigating the risk of arbitrary code execution. By limiting these operations to controlled and validated components, you reduce the attack surface significantly. Incorporating mechanisms to directly validate the Base64 format at the component level is also a good way to ensure data integrity and security. However, Base64 output is just one of several ways for encoding. I do think it could be better if we have a mechanism to block outbound call by invoking this API.
The refactoring to include Casbin for role-based access control (RBAC) is a positive move. Beyond debug/build modes, implementing account-based API call control will enhance security by ensuring that only authorized users can execute certain actions. Emphasizing this in the documentation, especially with a focus on disabling the AUTO LOGIN
flag, will help users understand the importance of these security measures. Using real-world scenarios, such as the one mentioned, can effectively highlight potential risks and encourage best practices.
Allowing exec-like code in custom components poses significant security risks, as it provides a direct method for attackers to execute arbitrary commands. Even with input validation and sandboxing, the inherent risk of arbitrary code execution remains high. I recommend considering the following additional measures:
subprocess
or os
.I appreciate the efforts being made to enhance Langflow's security and am available to provide further insights or assistance as needed.
@yamonkjd I just added a commit publishing Security policy page. https://github.com/langflow-ai/langflow/pull/2000 This is to publish a CVE tracking number for this issue. Hope you can review and approve and publish an advisory for this.
lol, We are contributor to Langflow, not a core member. I seem to have caused some confusion.
@ogabrielluiz Hey, could you please review this request?
CVE-2024-37014 has been assigned by MITRE to track this security issue.
Quick note Langflow is a very interesting and useful framework for those who work with AI projects. Personally, it's a cool project to learn from. It seems the Custom Component feature was launched to support user-defined Python scripts that use the Langflow provided library. This is a great feature. However, upon reviewing its documentation in [1] and [2], it seems there is no mention of potential security issues. Hence, perhaps the following finding was likely not expected. [1] https://docs.langflow.org/components/custom [2] https://docs.langflow.org/guidelines/custom-component
Describe the bug The Custom Component feature allows users to provide their own Python scripts using the CustomComponent class provided by the Langflow library. This is excellent for local testing and experimentation. However, if the framework is hosted online, it creates a potential security issue where a bad actor can leverage the opportunity to provide arbitrary Python code and gain code execution ability against the hosting server.
Impacted API POST /api/v1/custom_component
Browser and Version
To Reproduce Steps to reproduce the behavior:
CustomComponent
, withinComponent
class, provide the following Python functionCheck & Save
, the/api/v1/custom_component
API is invoked to process the provided Python script, which then leads to OS command execution. The output will be Base64 encoded and sent to a malicious server.Screenshots
Additional context The vulnerability allows for arbitrary code execution by injecting malicious code through the Custom Component feature. This could lead to significant security risks, including data theft, unauthorized access, and potential disruption of services (especially when being hosted publicly)