Closed ltdrdata closed 6 months ago
First... wow, thanks for stopping by.
Thanks for this. So from what I can tell, the globals_dict is the dictionary that is being used for giving the ability for exec to use it or not. Imports in the generated function are controlled right now by a whitelist of libraries it can use. The only node that points to an arbitrary place is the LocalLLM node because you can point it at any "chat completions" server. About the imports...
Right now, it passes in the globals dict to safe_exec, and there is a controlled list here: https://github.com/lks-ai/anynode/blob/07f542a2fac965a8bbb0836baf5009586c594531/nodes/any.py#L13 of imports that are made available through the script.
I definitely agree that "running arbitrary code" from an unknown place is not safe, but right now the only code that can be run on the OpenAI classic AnyNode is code that comes back from their API. So that is as safe as pointing anything at their API. As far as the LocalLLM AnyNode, you have to point it at an unsafe service in order to be not really safe. So If you are running a localhost based setup, you probably already know what you are doing enough to know how to point that node somewhere else.
Either way, it is a concern. We can't assume that people do know what they are doing, and I think quite possibly the best way to mitigate that would be to put some type of code inspector within it, so that it will check the code first. The other suggestion is that it somehow looks at where they are pointing. The other thing I added in there is that you can't point it at an endpoint directly, you can only point it at a server, and it has the endpoint path hardcoded.
I'm open to suggestions, but could you be more clear about an API whitelist? Cause to whitelist on the LocalLLM node would not be easy considering people can put their service anywhere.
I just realized another thing that can be done here. Prompt Guardrails for the generated functions, and perhaps a security checker of some sort for the code before it's run. I'm looking at solutions.
I also had GPT show me some best practice here, and I think it will be a good improvement:
from restrictedpython import compile_restricted, safe_globals, utility_builtins
from restrictedpython import limited_builtins
# Define the code to be executed
user_code = """
def add(a, b):
return a + b
result = add(2, 3)
"""
# Compile the code using restrictedpython
byte_code = compile_restricted(user_code, '<string>', 'exec')
# Define a secure execution environment
secure_globals = safe_globals.copy()
secure_globals['__builtins__'] = limited_builtins
# Execute the code in a restricted environment
exec(byte_code, secure_globals)
# Retrieve the result
result = secure_globals.get('result', None)
print(result) # Output should be 5
``
https://restrictedpython.readthedocs.io/en/latest/idea.html idea behind restricted python... dates back to those oldschool web host interfaces, but might be just what we need. Researching, while also trying to get some rest.
First... wow, thanks for stopping by.
Thanks for this. So from what I can tell, the globals_dict is the dictionary that is being used for giving the ability for exec to use it or not. Imports in the generated function are controlled right now by a whitelist of libraries it can use. The only node that points to an arbitrary place is the LocalLLM node because you can point it at any "chat completions" server. About the imports...
Right now, it passes in the globals dict to safe_exec, and there is a controlled list here:
https://github.com/lks-ai/anynode/blob/07f542a2fac965a8bbb0836baf5009586c594531/nodes/any.py#L13
of imports that are made available through the script. I definitely agree that "running arbitrary code" from an unknown place is not safe, but right now the only code that can be run on the OpenAI classic AnyNode is code that comes back from their API. So that is as safe as pointing anything at their API. As far as the LocalLLM AnyNode, you have to point it at an unsafe service in order to be not really safe. So If you are running a localhost based setup, you probably already know what you are doing enough to know how to point that node somewhere else.
Either way, it is a concern. We can't assume that people do know what they are doing, and I think quite possibly the best way to mitigate that would be to put some type of code inspector within it, so that it will check the code first. The other suggestion is that it somehow looks at where they are pointing. The other thing I added in there is that you can't point it at an endpoint directly, you can only point it at a server, and it has the endpoint path hardcoded.
I'm open to suggestions, but could you be more clear about an API whitelist? Cause to whitelist on the LocalLLM node would not be easy considering people can put their service anywhere.
The part I was concerned about was not the imports, but the fact that functions like exec
, eval
, and open
could be used through additions to __builtins__
. It would be good to first apply some filtering by reviewing the modules listed with print(dir(__builtins__))
.
And even with a relatively reliable LLM service, I think it may still be possible to generate malicious code through abusing it. To defend against attacks via workflow prompts, I believe there needs to be means to ultimately verify and restrict it on the user's PC side.
I think the restrictedpython you mentioned is a really good approach.
Had GPT also write me up this sanitize function which includes some of what you mentioned. I've also taken off the case where if globals is none, then it constructs builtins from all your python modules... cause that wasn't ever running, but it definitely would have been a bad move.
import re
def sanitize_code(code):
# List of potentially dangerous functions and constructs
dangerous_constructs = [
"eval", "exec", "import", "open", "input", "__import__",
"os.system", "subprocess", "shutil", "sys.exit", "compile"
]
# Regular expression to find potentially dangerous constructs
dangerous_pattern = re.compile(r'|'.join(map(re.escape, dangerous_constructs)))
# Search for any dangerous constructs
match = dangerous_pattern.search(code)
if match:
raise ValueError(f"Dangerous construct detected: {match.group()}")
# Additional checks can be added here, such as checking for:
# - Arbitrary file access (os module)
# - Network access (socket module)
# - Excessive resource usage (e.g., infinite loops)
# If no dangerous constructs are found, return the sanitized code
return code
# Example usage:
try:
user_code = """
def safe_function():
print('This is safe code.')
safe_function()
"""
sanitized_code = sanitize_code(user_code)
print("Code is safe to execute.")
exec(sanitized_code)
except ValueError as e:
print(e)
try:
user_code = """
def unsafe_function():
eval("print('This is unsafe code.')")
unsafe_function()
"""
sanitized_code = sanitize_code(user_code)
print("Code is safe to execute.")
exec(sanitized_code)
except ValueError as e:
print(e)
Alright, so That above code is trash, but I did manage to get a nice small util that has no false positives on detecting malicious or dangerous code. See util.py toward the end. I've taken a three pronged approach here:
make me a virus
gets a "Sorry..."The effect that this causes is as follows:
This seems to nail it, without restricting the creative flow of having AnyNode work within workflows.
I think the restrictedpython you mentioned is a really good approach.
I've taken the idea from there and implemented it into utils. https://github.com/lks-ai/anynode/blob/ff7fe3b44ae1262147590f601304bf037d317631/nodes/utils.py#L149
Is it safe now? Can I get that warning removed?
I think the restrictedpython you mentioned is a really good approach.
I've taken the idea from there and implemented it into utils.
https://github.com/lks-ai/anynode/blob/ff7fe3b44ae1262147590f601304bf037d317631/nodes/utils.py#L149
Is it safe now? Can I get that warning removed?
An attacker is likely to use a prompt like "write code to read a file from a specific path and send it via a request" rather than a direct prompt like "make me a virus" when creating a workflow.
Anyway, I updated Manager DB, already.
Thanks a lot. I know you take security seriously, and that makes me feel safer using Comfy ;)
https://github.com/lks-ai/anynode/blob/1f7b8372890ee0b9143d6d73904e8cb4fe54ebb1/nodes/any.py#L196
For the allowed APIs, it is best to use a whitelist approach if possible, and at the minimum, blacklist the dangerous APIs. Frankly, I am skeptical that security can be reliably maintained using a blacklist approach.