zopefoundation / RestrictedPython

A restricted execution environment for Python to run untrusted code.
http://restrictedpython.readthedocs.io/
Other
472 stars 38 forks source link

How to know what needs to be added to the globals? #261

Open seperman opened 1 year ago

seperman commented 1 year ago

BUG/PROBLEM REPORT / FEATURE REQUEST

Hello!

If I pass restricted_globals=dict(__builtins__=safe_builtins), my function does not compile. I don't see any errors that it was not compiled. How do I know what needs to be added to safe_builtins ? If I don't pass restricted_globals, then the function is compiled.

What I did:

from RestrictedPython import compile_restricted, safe_builtins
restricted_globals = dict(__builtins__=safe_builtins)

source_code = """
valid_sizes = ['6', '7', '8', '8.5', '9', '10', '10.5']

def check(row, valid_sizes=valid_sizes):
    return any('shoes' in category.lower() and all(size in valid_sizes for size in row['sizes']) for category in row['categories'])

"""

byte_code = compile_restricted(
    source_code,
    filename='<string>',
    mode='exec'
)

exec(byte_code, restricted_globals)
check({'categories': ['Shoes', "Women's Shoes", 'Clothing', "All Women's Shoes"], 'sizes': ['8.5', '7', '9.5', '7.5', '8']})

What I expect to happen:

check to be executed.

What actually happened:

NameError: name 'check' is not defined

What version of Python and Zope/Addons I am using:

Python 3.11.4 RestrictedPython==6.2 Ubuntu

d-maurer commented 1 year ago

Sep Dehpour wrote at 2023-9-13 12:15 -0700:

If I pass restricted_globals=dict(__builtins__=safe_builtins), my function does not compile. I don't see any errors that it was not compiled.

The compilation is independent from what you pass to the exec. You do not see errors, because the compilation was successful.

How do I know what needs to be added to safe_builtins ? If I don't pass restricted_globals, then the function is compiled.

You read the documentation (--> "https://restrictedpython.readthedocs.io/en/latest/usage/basic_usage.html#necessary-setup") to learn what you must pass to exec to get the various features. What features are really necessary depends on your code.

seperman commented 1 year ago

Thanks for the prompt response. My question is how can RestrictedPython tell me what "features" I need to enable in order for a specific piece of bytecode to execute? It is not throwing any errors or logs that tell me I need to define _attr_ or something.

Also I followed the link you mentioned for necessary setup. Here is my new code that still doesn't work:

from RestrictedPython import compile_restricted
from RestrictedPython import safe_builtins
from RestrictedPython import limited_builtins
from RestrictedPython import utility_builtins
from RestrictedPython.Guards import full_write_guard, safer_getattr, guarded_iter_unpack_sequence
from RestrictedPython.Eval import default_guarded_getiter

ALLOWED_BUILTINS = {}
ALLOWED_BUILTINS.update(safe_builtins)
ALLOWED_BUILTINS.update(limited_builtins)
ALLOWED_BUILTINS.update(utility_builtins)

_write_ = full_write_guard
_getattr_ = safer_getattr
_getiter_ = default_guarded_getiter
_iter_unpack_sequence_ = guarded_iter_unpack_sequence

restricted_globals = dict(__builtins__=ALLOWED_BUILTINS)

source_code = """
valid_sizes = ['6', '7', '8', '8.5', '9', '10', '10.5']

def check(row, valid_sizes=valid_sizes):
    return any('shoes' in category.lower() and all(size in valid_sizes for size in row['sizes']) for category in row['categories'])

"""

byte_code = compile_restricted(
    source_code,
    filename='<string>',
    mode='exec'
)

exec(byte_code, restricted_globals)
result = check({'categories': ['Shoes', "Women's Shoes", 'Clothing', "All Women's Shoes"], 'sizes': ['8.5', '7', '9.5', '7.5', '8']})
print(result)
d-maurer commented 1 year ago

Sep Dehpour wrote at 2023-9-13 16:39 -0700:

Thanks for the prompt response. My question is how can RestrictedPython tell me what "features" I need to enable in order for a specific piece of bytecode to execute?

It cannot (not easily). The normal approach is:

I suggest to look at the package AccessControl. This is the RestrictedPython policy package as used by Zope. It supports most features and secures them based on Zopes user-permission-role concept.

Keep in mind: RestrictedPython by itself is only a small part of the solution: it transforms (parsed) Python code to give an external policy package a way to implement features under the premisses of its security concept.

If you have concrete code, you can find out which features this code uses via a disassemly (--> module dis in the Python runtime library). RestrictedPython compiles almost all of its known features into function calls with a function name starting with _. E.g. attribute access is transformed in a _getattr call, subscription into a _getitem call. Thus, you can look in a disassembly of your code for such calls.

However, RestrictedPython is designed for the case where you get code from an untrusted source. Therefore, it is impractical to look for the features this code needs and provide implementations for it on the fly. Instead, you decide beforehand what features you want to support (and how to secure them), set up a corresponding environment and let the code execution fail when it uses unsupported features. Using an unsupported feature will typically let the code execution fail with a NameError, naming the missing policy function, e.g. if you do not support attribute access, you execution environment will lack the _getattr function and code using it will get a NameError: _getattr.

It is not throwing any errors or logs that tell me I need to define _attr_ or something.

Even with RestrictedPython it is very difficult to provide for a secure environment. A very thourough concept is required.

That's why you do not start with example code (and look at its features). Instead you start thinking carefully about the security requirements of your application. E.g. you ask yourself: "can I allow attribute access? under what conditions?" You do this for all features documented for RestrictedPython and provide application specific implementations for those features your application should support -- maybe with special restrictions.

I think it is very instructive to look at AccessControl (and maybe read the Zope documentation to learn about its security concept) to find out how it builds its secured runtime environment for RestrictedPython.

... Also I followed the link you mentioned for necessary setup. Here is my new code that still doesn't work:


...
source_code = """
valid_sizes = ['6', '7', '8', '8.5', '9', '10', '10.5']

def check(row, valid_sizes=valid_sizes):
   return any('shoes' in category.lower() and all(size in valid_sizes for size in row['sizes']) for category in row['categories'])

"""

You have defined the check function as part of source.

byte_code = compile_restricted( source_code, filename='', mode='exec' )

exec(byte_code, restricted_globals) result = check({'categories': ['Shoes', "Women's Shoes", 'Clothing', "All Women's Shoes"], 'sizes': ['8.5', '7', '9.5', '7.5', '8']}) print(result)

You likely see a NameError: check -- because the check definition is executed in the runtime context described by restricted_globals not the current module context. To access it, you would need restricted_globals["check"].

Note: when you report further problems, do not only include the source but also the error information you have observed. In the case above, this would have been: I used the source ... and have got a NameError: check.

seperman commented 1 year ago

Thanks for detailed response @d-maurer I think my confusion was caused because when restricted_globals are not passed, you are modifying globals directly and the check function appears out of nowhere. Now that I understand it, everything is more clear. I was not even getting the NameError: check before because check was not in globals. It was in restricted_globals.

Going back to the security aspect, the goal is to use RestrictedPython to run untrusted code of course. Where is a good place to see the list of all Python's built-in functions and the risk associated with them? Ideally I want to allow all the functions with no risk. For example math functions, string modification functions, etc.

If I want to go way too limited, I could have gone with Starlark.

Thank you!

d-maurer commented 1 year ago

Sep Dehpour wrote at 2023-9-14 12:14 -0700:

... Where is a good place to see the list of all Python's built-in functions and the risk associated with them?

Most builtin functions are documented in the Python documentation.

Regarding the risks, this is a very difficult question and may highly depend on your application.

RestrictedPython contains a set (--> safe_builtins) of functions considered mostly safe. BUT only a very thorough analysis specific for your application can show that those functions are really safe for you. For example, I recently had to remove a class from the string module variant exposed via safe_builtins. With this class it was possible to "read" variables whose name start with _ which should not be possible with RestrictedPython. This is a big problem for applications which put sensitive information into _XXX variables trusting that they cannot be accessed.

Another example: Some applications may want to allow untrusted code to use regular expressions. But it is not difficult to contruct extremely expensive regular expressions -- so expensive that they can be used for DOS (= "Denial Of Service") attacks.

My recommendation: Start with safe_builtins. Remove things your application likely does not need. If your application gains a lot from a functionality, make a risk analysis. If the risk seems acceptable, provide the functionality via a wrapper. The wrapper will allow you to tighen the enforced conditions should experience show a necessity.

Example: you might expose regular expressions via a wrapper. Should you find out that regular expressions are abused, the wrapper could check for bad expressions and reject them.