Inconsistent heatmaps and the current timing script

Schefflera-Arboricola commented 5 months ago

current heatmap- in repo(it says- ran on 2 cores):

I tried re-running these and got these results(used 8 cores):

Potential cause: other background processes or the machine goes into 'sleep mode' in between, this can increase or decrease the time a function takes.

Potential solution: running heatmaps on VM and using some other timing function that gives the time CPU core(s) was actually running the process, instead of just time.time().

dschult commented 5 months ago

Can you look into Python's std library timeit.Timer() tool? I believe that is what IPython uses for its %timeit and %%timeit functions (which are really nice cuz they time it multiple times and report stdev of results and give warnings if one is way off of the others, etc.) Also, timeit.timeit can work well for one-liners. But timeit.Timer is the more complete interface.

https://docs.python.org/3/library/timeit.html

Schefflera-Arboricola commented 3 months ago

I was looking into how to time processes that create other parallel processes.

The time.[process_time() just times the parent process's time and ignores the child processes' times. Also it didn't seem very clear to me what exactly it meant to time processes that are running in parallel. Do we take the process that took the maximum time or take an average time? And also we don't just have to time the parallel part, but also the non-parallel part of an algorithm.

So, I started looking into how ASV benchmarks time their benchmarks and they use timeit.default_timer() and it doesn't remove the inconsistencies caused by laptop going in sleep mode or other heavier processes running in the background, etc. But, they run it multiple times to minimize the effects of such inconsistencies and I think we can also do the same for heatmaps. (ref. asv timing docs)

the timeit docs also suggests running the program multiple times: "default_timer() measurements can be affected by other programs running on the same machine, so the best thing to do when accurate timing is necessary is to repeat the timing a few times and use the best time. The -r option is good for this; the default of 5 repetitions is probably enough in most cases. You can use time.process_time() to measure CPU time."

PS: Before implementing all this and re-generating all the heatmaps, I just want to dig a bit deeper and figure out how exactly the timeit.default_timer() works internally and if that aligns with what we are trying to do here. And also if there are any other better alternatives. And also look into how running and timing parallel processes on my local machine is different from running and timing them on a VM over ssh.

dschult commented 3 months ago

It seems like you want some approximation of "wall clock time" -- that is, how long between starting and finishing the call. I don't think we want to somehow compute the total cpu-time of the call. So timeit.default_timer seems good. It looks like it calls time.perf_counter() which relies on the OS clock values.

Running it multiple times and taking the best time also seems reasonable.

Using your own machine and comparing to a VM is expected to give different values, but hopefully the factor of speedup is similar. Timing is difficult to get exactly right. But a consistent set of hardware without distractions from other processes will be a good start.

eriknw commented 2 months ago

fyi, graphblas-algorithms uses timeit.Timer. It runs the specified number of times, or tries to run for the target number of seconds. If running for a target number of seconds, it estimates the number of times to run based on the first run. See: https://github.com/python-graphblas/graphblas-algorithms/blob/35dbc90e808c6bf51b63d51d8a63f59238c02975/scripts/bench.py#L172 and https://github.com/python-graphblas/graphblas-algorithms/blob/35dbc90e808c6bf51b63d51d8a63f59238c02975/scripts/bench.py#L202

Would there be any interest in making common utilities that make it easy to benchmark different backends? Maybe we could use what we have for graphblas-algorithms as a starting point. I'd be happy to talk about what it does and to discuss what y'all want to do for nx-parallel.

nx-cugraph uses pytest-benchmark.

networkx / nx-parallel

Inconsistent heatmaps and the current timing script #51