openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
MIT License
2.31k stars 330 forks source link

execution.py bug request #3

Closed rainmaker712 closed 3 years ago

rainmaker712 commented 3 years ago

bug request for execution.py

This part (https://github.com/openai/human-eval/blob/77b90b8f70e2553ba720c3d24156acfd28104ec4/human_eval/execution.py#L48)

            try:
                exec_globals = {}
                with swallow_io():
                    with time_limit(timeout):
# WARNING
# This program exists to execute untrusted model-generated code. Although
# it is highly unlikely that model-generated code will do something overtly
# malicious in response to this test suite, model-generated code may act
# destructively due to a lack of model capability or alignment.
# Users are strongly encouraged to sandbox this evaluation suite so that it 
# does not perform destructive actions on their host or network. For more 
# information on how OpenAI sandboxes its code, see the accompanying paper.
# Once you have read this disclaimer and taken appropriate precautions, 
# uncomment the following line and proceed at your own risk:
#                         exec(check_program, exec_globals)
                result.append("passed")
            except TimeoutException:
                result.append("timed out")
            except BaseException as e:
                result.append(f"failed: {e}")

            # Needed for cleaning up.
            shutil.rmtree = rmtree
            os.rmdir = rmdir
            os.chdir = chdir

should be fixed since it causes an error due to the indentation issue.

heewooj commented 3 years ago
# WARNING
# ...
# Once you have read this disclaimer and taken appropriate precautions, 
# uncomment the following line and proceed at your own risk:
#                         exec(check_program, exec_globals)

You need to uncomment the above line. Also, please read our instructions/warning more carefully to understand what the reliability guard can and can not do.