openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
MIT License
2.3k stars 329 forks source link

why use ThreadPoolExecutor with GIL in background? #36

Open johnmclain opened 7 months ago

johnmclain commented 7 months ago

In evaluation the code uses ThreadPoolExecutor at first and in each thread use multiprocessing package. Why not use ProcessPoolExecutor at first? Is there any consideration of optimizing performance?

rjarun8235 commented 2 months ago

@johnmclain

The ThreadPoolExecutor is used for concurrently executing the validation task for multiple generated code samples. Now these code sample validation are further wrapped as a process via multiprocessing to contain and isolate it . This ensures there is no conflict and also as a system security measure , executing and validating unsafe code.

Now why not processpool directly?

Threads are light and scales well with resources. using a processpool directly is slightly less secure approach to validate unsafe code

If the code is safe code , using processpool directly makes a better sense.