Open apsun opened 7 years ago
@apsun After a lot of digging, it seems that it isn't possible to easily get around the built-in type
function. I made a PR #296 which takes that approach to fixing this problem.
@fschr I believe all of the numpy types (np.int32
, np.float32
, etc) also need to be accounted for. AFAIK np.number
is just an abstract base class.
cc @lukeolson @solomonik for entertainment (and full disclosure)
Another evil one I just found:
def hax(*args, **kwargs):
return True
np.allclose = hax
Bypasses the check_numpy_array_allclose
check. It seems that the code isn't being sandboxed at all. This one is particularly worrying:
import subprocess
print(subprocess.check_output(["ls", "/"]).decode("utf-8"))
Thanks for taking an interest, for responsibly disclosing this, and for even helping to fix the underlying issue! I'll comment more on the PR.
def hax(*args, **kwargs):
return True
np.allclose = hax
Good one. Easy to fix though. We should just grab a reference to the functions we need to trust (like np.allclose
) before any user code runs.
Speaking of potential autograder holes, the one thing I'm most concerned about is stack traversals up to the code that runs the autograder. That thing has the correct code floating around, which is a legitimate concern. If you'd like to hep fix that, i would totally even pay you hourly for that. :) The one solution I can think of is taking the grading into a separate process--but I wonder if we can afford the overhead.
This one:
import subprocess
print(subprocess.check_output(["ls", "/"]).decode("utf-8"))
looks concerning, but isn't. All the autograder code runs in short-lived Docker containers, so the real barrier are the Linux kernel's network/filesystem/process namespaces. That has occasional holes, too, but it's robust enough that people run businesses on it, so it's good enough for me. Plus if someone compromises the container, they get a pretty dinky VM somewhere in the VM farm with no access to anything privleged.
Fundamentally though, autograding a language like Python is a tradeoff between bullet-proof-ness and performance...
The one solution I can think of is taking the grading into a separate process--but I wonder if we can afford the overhead.
It's also problematic because it makes passing callables from user code to grading code basically impossible.
stack traversals up to the code that runs the autograder
Someone could also just hijack the socket by which the autograder runner talks to the Relate mothership. Or wholesale replace the bytecode of the grading code. Or...
The one reason I can sit here somewhat calmly and discuss all these holes is that they're all imperfect crimes. Every submission is logged, so even if we can't perfectly prevent it, we can at least reconstruct what happened after the fact.
Oh, cool, I wasn't aware that the VM wasn't persistent.
As for the function replacing thing, it should be possible to use importlib.reload
to "undo" all user changes to the module, to avoid having to save each and every function (plus some numpy functions have internal dependencies, but I think anyone capable of bypassing that could probably do the homework themselves ;-p).
Sandboxed imports can be done with something like this:
def make_safe_environ(allowed_imports=["numpy", "sympy", "matplotlib"]):
def safe_import(name, globals=None, locals=None, fromlist=(), level=0):
if name not in allowed_imports:
raise ImportError("Illegal import: %s" % name)
return __import__(name, globals, locals, fromlist, level)
new_builtins = dict(__builtins__.__dict__)
new_builtins["__import__"] = safe_import
environ = {"__builtins__": new_builtins}
return environ
And using exec(code, make_safe_environ())
(surrounded with a try-except
to prevent exception bubbling of course) to run the user submitted code. It's probably not perfect, but I imagine most people won't be able to get around it.
Edit: On second thought, numpy and the other dependencies probably rely on other modules too. This probably wouldn't work too well.
It is easy to trick the autograder into accepting an object that cheats the correctness checks. Here's two examples that bypass the scalar value check, tested on CS357 quiz 17:
Changing
isinstance(x, (...))
totype(x) in [...]
(and fixing theis_number
check) would prevent this particular trick by disallowing subclasses, though I'm not sure if that would be secure either.Sorry if you don't want this code floating around in public, let me know and I'll remove it ASAP.