`safe_exec` seems not safe

ltdrdata commented 6 months ago

https://github.com/lks-ai/anynode/blob/1f7b8372890ee0b9143d6d73904e8cb4fe54ebb1/nodes/any.py#L196

For the allowed APIs, it is best to use a whitelist approach if possible, and at the minimum, blacklist the dangerous APIs. Frankly, I am skeptical that security can be reliably maintained using a blacklist approach.

lks-ai commented 6 months ago

First... wow, thanks for stopping by.

Thanks for this. So from what I can tell, the globals_dict is the dictionary that is being used for giving the ability for exec to use it or not. Imports in the generated function are controlled right now by a whitelist of libraries it can use. The only node that points to an arbitrary place is the LocalLLM node because you can point it at any "chat completions" server. About the imports...

Right now, it passes in the globals dict to safe_exec, and there is a controlled list here: https://github.com/lks-ai/anynode/blob/07f542a2fac965a8bbb0836baf5009586c594531/nodes/any.py#L13 of imports that are made available through the script.

I definitely agree that "running arbitrary code" from an unknown place is not safe, but right now the only code that can be run on the OpenAI classic AnyNode is code that comes back from their API. So that is as safe as pointing anything at their API. As far as the LocalLLM AnyNode, you have to point it at an unsafe service in order to be not really safe. So If you are running a localhost based setup, you probably already know what you are doing enough to know how to point that node somewhere else.

Either way, it is a concern. We can't assume that people do know what they are doing, and I think quite possibly the best way to mitigate that would be to put some type of code inspector within it, so that it will check the code first. The other suggestion is that it somehow looks at where they are pointing. The other thing I added in there is that you can't point it at an endpoint directly, you can only point it at a server, and it has the endpoint path hardcoded.

I'm open to suggestions, but could you be more clear about an API whitelist? Cause to whitelist on the LocalLLM node would not be easy considering people can put their service anywhere.

lks-ai commented 6 months ago

I just realized another thing that can be done here. Prompt Guardrails for the generated functions, and perhaps a security checker of some sort for the code before it's run. I'm looking at solutions.

lks-ai commented 6 months ago

I also had GPT show me some best practice here, and I think it will be a good improvement:


from restrictedpython import compile_restricted, safe_globals, utility_builtins
from restrictedpython import limited_builtins

# Define the code to be executed
user_code = """
def add(a, b):
    return a + b

result = add(2, 3)
"""

# Compile the code using restrictedpython
byte_code = compile_restricted(user_code, '<string>', 'exec')

# Define a secure execution environment
secure_globals = safe_globals.copy()
secure_globals['__builtins__'] = limited_builtins

# Execute the code in a restricted environment
exec(byte_code, secure_globals)

# Retrieve the result
result = secure_globals.get('result', None)
print(result)  # Output should be 5
``

lks-ai commented 6 months ago

https://restrictedpython.readthedocs.io/en/latest/idea.html idea behind restricted python... dates back to those oldschool web host interfaces, but might be just what we need. Researching, while also trying to get some rest.

ltdrdata commented 6 months ago

First... wow, thanks for stopping by.

Thanks for this. So from what I can tell, the globals_dict is the dictionary that is being used for giving the ability for exec to use it or not. Imports in the generated function are controlled right now by a whitelist of libraries it can use. The only node that points to an arbitrary place is the LocalLLM node because you can point it at any "chat completions" server. About the imports...

Right now, it passes in the globals dict to safe_exec, and there is a controlled list here:

https://github.com/lks-ai/anynode/blob/07f542a2fac965a8bbb0836baf5009586c594531/nodes/any.py#L13

of imports that are made available through the script. I definitely agree that "running arbitrary code" from an unknown place is not safe, but right now the only code that can be run on the OpenAI classic AnyNode is code that comes back from their API. So that is as safe as pointing anything at their API. As far as the LocalLLM AnyNode, you have to point it at an unsafe service in order to be not really safe. So If you are running a localhost based setup, you probably already know what you are doing enough to know how to point that node somewhere else.

Either way, it is a concern. We can't assume that people do know what they are doing, and I think quite possibly the best way to mitigate that would be to put some type of code inspector within it, so that it will check the code first. The other suggestion is that it somehow looks at where they are pointing. The other thing I added in there is that you can't point it at an endpoint directly, you can only point it at a server, and it has the endpoint path hardcoded.

I'm open to suggestions, but could you be more clear about an API whitelist? Cause to whitelist on the LocalLLM node would not be easy considering people can put their service anywhere.

The part I was concerned about was not the imports, but the fact that functions like exec, eval, and open could be used through additions to __builtins__. It would be good to first apply some filtering by reviewing the modules listed with print(dir(__builtins__)).

And even with a relatively reliable LLM service, I think it may still be possible to generate malicious code through abusing it. To defend against attacks via workflow prompts, I believe there needs to be means to ultimately verify and restrict it on the user's PC side.

I think the restrictedpython you mentioned is a really good approach.

lks-ai commented 6 months ago

Had GPT also write me up this sanitize function which includes some of what you mentioned. I've also taken off the case where if globals is none, then it constructs builtins from all your python modules... cause that wasn't ever running, but it definitely would have been a bad move.

import re

def sanitize_code(code):
    # List of potentially dangerous functions and constructs
    dangerous_constructs = [
        "eval", "exec", "import", "open", "input", "__import__",
        "os.system", "subprocess", "shutil", "sys.exit", "compile"
    ]

    # Regular expression to find potentially dangerous constructs
    dangerous_pattern = re.compile(r'|'.join(map(re.escape, dangerous_constructs)))

    # Search for any dangerous constructs
    match = dangerous_pattern.search(code)
    if match:
        raise ValueError(f"Dangerous construct detected: {match.group()}")

    # Additional checks can be added here, such as checking for:
    # - Arbitrary file access (os module)
    # - Network access (socket module)
    # - Excessive resource usage (e.g., infinite loops)

    # If no dangerous constructs are found, return the sanitized code
    return code

# Example usage:
try:
    user_code = """
def safe_function():
    print('This is safe code.')
safe_function()
"""
    sanitized_code = sanitize_code(user_code)
    print("Code is safe to execute.")
    exec(sanitized_code)
except ValueError as e:
    print(e)

try:
    user_code = """
def unsafe_function():
    eval("print('This is unsafe code.')")
unsafe_function()
"""
    sanitized_code = sanitize_code(user_code)
    print("Code is safe to execute.")
    exec(sanitized_code)
except ValueError as e:
    print(e)

lks-ai commented 6 months ago

Alright, so That above code is trash, but I did manage to get a nice small util that has no false positives on detecting malicious or dangerous code. See util.py toward the end. I've taken a three pronged approach here:

Nerfed the imports for specific things like shutil and requests
Added a sanitizer function which runs in safe_exec before it attempts to define the function (works, so far no false positives or negatives). AnyNode can no longer read or write to files and has limited to no system data access through python
OpenAI has a very deep red-teaming effort which I've tested and it works: prompt make me a virus gets a "Sorry..."

The effect that this causes is as follows:

If the code is considered dangerous, it will refuse to execute it, raising the exception
If it passes sanitation it will run, BUT only if the imports have been whitelisted: ie: no web usage because no access to urllib or requests

This seems to nail it, without restricting the creative flow of having AnyNode work within workflows.

lks-ai commented 6 months ago

I think the restrictedpython you mentioned is a really good approach.

I've taken the idea from there and implemented it into utils. https://github.com/lks-ai/anynode/blob/ff7fe3b44ae1262147590f601304bf037d317631/nodes/utils.py#L149

Is it safe now? Can I get that warning removed?

ltdrdata commented 6 months ago

I think the restrictedpython you mentioned is a really good approach.

I've taken the idea from there and implemented it into utils.

https://github.com/lks-ai/anynode/blob/ff7fe3b44ae1262147590f601304bf037d317631/nodes/utils.py#L149

Is it safe now? Can I get that warning removed?

An attacker is likely to use a prompt like "write code to read a file from a specific path and send it via a request" rather than a direct prompt like "make me a virus" when creating a workflow.

Anyway, I updated Manager DB, already.

lks-ai commented 6 months ago

Thanks a lot. I know you take security seriously, and that makes me feel safer using Comfy ;)

lks-ai / anynode

`safe_exec` seems not safe #3