openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
MIT License
2.31k stars 330 forks source link

evaluate_functional_correctness can't run #18

Closed BoyuanJackChen closed 4 months ago

BoyuanJackChen commented 1 year ago

I created a conda environment with python3.7 using the exact same command in the doc. Then, I used openai's text-davinci-002 to generate a samples.jsonl file with 3 results for each problem.

Calling evaluate_functional_correctness samples.jsonl, I got the error message as below. I also tried to evaluate the example results with evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl, and got the same error.

I wonder how to fix it?

Error message: Reading samples... 6it [00:00, 7427.93it/s] Running test suites... 0%| | 0/6 [00:00<?, ?it/s] Traceback (most recent call last): File "/opt/miniconda3/envs/codex/bin/evaluate_functional_correctness", line 33, in sys.exit(load_entry_point('human-eval', 'console_scripts', 'evaluate_functional_correctness')()) File "/opt/miniconda3/envs/codex/bin/evaluate_functional_correctness", line 25, in importlib_load_entry_point return next(matches).load() File "/opt/miniconda3/envs/codex/lib/python3.8/importlib/metadata.py", line 77, in load module = import_module(match.group('module')) File "/opt/miniconda3/envs/codex/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/Users/boyuanchen/Desktop/human-eval/human_eval/evaluate_functional_correctness.py", line 30, in sys.exit(main()) File "/Users/boyuanchen/Desktop/human-eval/human_eval/evaluate_functional_correctness.py", line 27, in main fire.Fire(entry_point) File "/opt/miniconda3/envs/codex/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/miniconda3/envs/codex/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/miniconda3/envs/codex/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(*varargs, *kwargs) File "/Users/boyuanchen/Desktop/human-eval/human_eval/evaluate_functional_correctness.py", line 22, in entry_point results = evaluate_functional_correctness(sample_file, k, n_workers, timeout, problem_file) File "/Users/boyuanchen/Desktop/human-eval/human_eval/evaluation.py", line 75, in evaluate_functional_correctness result = future.result() File "/opt/miniconda3/envs/codex/lib/python3.8/concurrent/futures/_base.py", line 437, in result return self.get_result() File "/opt/miniconda3/envs/codex/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/opt/miniconda3/envs/codex/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, **self.kwargs) File "/Users/boyuanchen/Desktop/human-eval/human_eval/execution.py", line 77, in check_correctness p.start() File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'check_correctness..unsafe_execute `

ghost commented 1 year ago

If you are on windows, this problem can be solved by declaring a global var of unsafe_execute just above the function.

And after this you would encounter Attribute error for unsafe_execute, that can be solved by installing multiprocess library and then replacing "multiprocessing" with multiprocess instead.

Another error on windows would be of pass@=0 instead of 0.4999, this can be solved by removing the timeout function in execution.py for sample sanity check only, for testing on generated samples you have to use timeout but have to find a way to use something different than setitimer() function which causes the pass@=0 issue.

WelliJohn commented 1 year ago

when I run on mac,I get the same error,can anyone help?thanks a lot

jacob1017 commented 1 year ago

reinstall multiprocess, and replace multiprocessing used in human_eval/execution.py, then here we go.

tianzhaotju commented 1 year ago

when I run on mac,I get the same error,can anyone help?thanks a lot

so sad!!! I also have the same problem. Do u solve it, my friend? thx

davide221 commented 1 year ago

After importing and change from multiprocessing to multiprocess. What worked for me was to use this

windows machine:

import threading

class TimeoutException(Exception):
    pass

@contextlib.contextmanager
def time_limit(seconds: float):
    timer = threading.Timer(seconds, lambda: (_ for _ in ()).throw(TimeoutException("Timed out!")))
    timer.start()
    try:
        yield
    finally:
        timer.cancel()

linux machine

@contextlib.contextmanager
def time_limit(seconds: float):
    def signal_handler(signum, frame):
        raise TimeoutException("Timed out!")
    signal.setitimer(signal.ITIMER_REAL, seconds)
    signal.signal(signal.SIGALRM, signal_handler)
    try:
        yield
    finally:
        signal.setitimer(signal.ITIMER_REAL, 0)
        pass
pnewhook commented 11 months ago

On a Mac with Python 3.10 installing multiprocess and removing imports and usages of multiprocessing worked for me.

RyanLoil commented 5 months ago

After importing and change from multiprocessing to multiprocess. What worked for me was to use this

windows machine:

import threading

class TimeoutException(Exception):
    pass

@contextlib.contextmanager
def time_limit(seconds: float):
    timer = threading.Timer(seconds, lambda: (_ for _ in ()).throw(TimeoutException("Timed out!")))
    timer.start()
    try:
        yield
    finally:
        timer.cancel()

linux machine

@contextlib.contextmanager
def time_limit(seconds: float):
    def signal_handler(signum, frame):
        raise TimeoutException("Timed out!")
    signal.setitimer(signal.ITIMER_REAL, seconds)
    signal.signal(signal.SIGALRM, signal_handler)
    try:
        yield
    finally:
        signal.setitimer(signal.ITIMER_REAL, 0)
        pass

For Windows user, here is more things need to be done:

  1. addif __name__ == "__main__": before the sys.exit(main()) otherwise you will get a error which recommends you to use "freeze_support()"
  2. install multiprocess and removing imports and usages of multiprocessing or you will get "Ran out of input" error. Please ignore the IDE's "can't find **()" warning as the package multiprocess doesn't announce them in init.py but it still works.
  3. change the time_limit function is necessary as the origin one didn't work properly.
  4. TimeoutException class defination is optional
NA-Wen commented 4 months ago

reinstall multiprocess, and replace multiprocessing used in human_eval/execution.py, then here we go.

This works well for me on windows 11. Thank you so much!

BoyuanJackChen commented 4 months ago

Thanks all! I have resolved the issue with the given advices.