Closed sweeneyde closed 3 years ago
For what it's worth, I wound up using two different programs, one to generate the benchmarks, and one to run them:
...
def generate_benchmarks():
output = []
for m in needle_lengths:
for n in haystack_lengths:
if n < m:
continue
for s in (1, 2, 3):
seed = (s*n + m) % 1_000_003
needle = zipf_string(m, seed)
haystack = zipf_string(n, seed ** 2)
name = f"needle={m}, haystack={n}, seed={s}"
output.append((name, needle, haystack))
with open("_generated.py", 'w') as f:
print("benches = [", file=f)
for name, needle, haystack in output:
print(f" {(name, needle, haystack)!r},", file=f)
print("]", file=f)
...
def do_timings():
import pyperf
runner = pyperf.Runner()
from _generated import benches
for name, needle, haystack in benches:
runner.bench_time_func(
name,
bench, needle, haystack,
inner_loops=10,
)
if __name__ == "__main__":
# generate_benchmarks()
do_timings()
Closing as a duplicate of https://github.com/psf/pyperf/issues/38
When I run the benchmark below, I notice that most of the time is spent in the
zipf_string()
function, I think re-creating the relevant strings in each and every spawned process. Putting aprint()
insidezipf_string()
prints continuously. This is significant, especially when running20*16*5==1600
benchmarks. Is there a recommended way to cache expensively-generated parameters so that each process does not have to create them from scratch?I am also generally confused about the multiprocessing model of pyperf: I would expect zipf_string to be called exactly twice (not hundreds of times) per benchmark. I couldn't find anything in the documentation about this. Am I missing something? I am running Windows 10.