Benchmark with disabled GIL build and -Xgil=1 isn't enabling GIL

tonybaloney commented 5 months ago

It seems that when you run PYTHON_GIL=1 python benchmark.py the benchmarked functions are being run with the GIL disabled and ignoring the flag.

Benchmark Code

```python import pyperf from multiprocessing import Process from threading import Thread try: import _xxsubinterpreters as interpreters except ImportError: import _interpreters as interpreters import itertools DEFAULT_DIGITS = 2000 icount = itertools.count islice = itertools.islice def gen_x(): return map(lambda k: (k, 4 * k + 2, 0, 2 * k + 1), icount(1)) def compose(a, b): aq, ar, as_, at = a bq, br, bs, bt = b return (aq * bq, aq * br + ar * bt, as_ * bq + at * bs, as_ * br + at * bt) def extract(z, j): q, r, s, t = z return (q * j + r) // (s * j + t) def gen_pi_digits(): z = (1, 0, 0, 1) x = gen_x() while 1: y = extract(z, 3) while y != extract(z, 4): z = compose(z, next(x)) y = extract(z, 3) z = compose((10, -10 * y, 0, 1), z) yield y def calc_ndigits(n=DEFAULT_DIGITS): return list(islice(gen_pi_digits(), n)) test =""" import itertools DEFAULT_DIGITS = 2000 icount = itertools.count islice = itertools.islice def gen_x(): return map(lambda k: (k, 4 * k + 2, 0, 2 * k + 1), icount(1)) def compose(a, b): aq, ar, as_, at = a bq, br, bs, bt = b return (aq * bq, aq * br + ar * bt, as_ * bq + at * bs, as_ * br + at * bt) def extract(z, j): q, r, s, t = z return (q * j + r) // (s * j + t) def gen_pi_digits(): z = (1, 0, 0, 1) x = gen_x() while 1: y = extract(z, 3) while y != extract(z, 4): z = compose(z, next(x)) y = extract(z, 3) z = compose((10, -10 * y, 0, 1), z) yield y def calc_ndigits(n=DEFAULT_DIGITS): return list(islice(gen_pi_digits(), n)) calc_ndigits() """ def bench_threading(n): # Code to launch specific model threads = [] for _ in range(n): t = Thread(target=calc_ndigits) t.start() threads.append(t) for thread in threads: thread.join() def bench_subinterpreters(n, site=True): # Code to launch specific model def _spawn_sub(): sid = interpreters.create() interpreters.run_string(sid, test) interpreters.destroy(sid) threads = [] for _ in range(n): t = Thread(target=_spawn_sub) t.start() threads.append(t) for thread in threads: thread.join() def bench_multiprocessing(n): # Code to launch specific model processes = [] for _ in range(n): t = Process(target=calc_ndigits) t.start() processes.append(t) for process in processes: process.join() if __name__ == "__main__": runner = pyperf.Runner() runner.metadata['description'] = "Benchmark execution models" n = 10 runner.bench_func('threading', bench_threading, n) runner.bench_func('subinterpreters', bench_subinterpreters, n) runner.bench_func('multiprocessing', bench_multiprocessing, n) ```

Results when running a CPython without --disable-gil:

output_gil1_compiled

Running the benchmark with PYTHON_GIL=0:

output_gil0

Running the benchmark with a no-gil build, but PYTHON_GIL=1 uses all 4 CPU cores and gives the fastest result (faster than PYTHON_GIL=0) which is wrong--

output_gil1

tonybaloney commented 5 months ago

From https://github.com/python/cpython/issues/118874

tonybaloney commented 5 months ago

@corona10 I saw some mentions from you in the backlog on this topic

corona10 commented 5 months ago

Okay this should be fixed, since I am busy before PyCon US, if you want to submit the patch before, you can send it :) Or I will take a look at it during the PyCon US.