python / pyperformance

Python Performance Benchmark Suite
http://pyperformance.readthedocs.io/
MIT License
869 stars 175 forks source link

Use more than one core #267

Open Eclips4 opened 1 year ago

Eclips4 commented 1 year ago

On the current moment, pyperformance loads a single core Is there any reason why this is so?

rockdrilla commented 1 year ago

I'd suggest to implement -j threads rather than running on all available cores. I'm using pyperformance for gathering profiling data in own weird Python PGO+LTO build.

corona10 commented 1 year ago

@rockdrilla

I'm using pyperformance for gathering profiling data in own weird Python PGO+LTO build.

If you don't mind, would you like to introduce your use case? These days I have an interest in increasing the coverage of profiling for PGO.

rockdrilla commented 1 year ago

@corona10 it's pretty weird solution. :smile:

In short: build Python with shared library, install "somewhere" pyperformance using "shared" Python, reconfigure Python for static binary and build it - it will run pyperformance while gathering PGO data.

applied patches (related for this case):

build script: debian/rules from package template

upd: benchmark results.

corona10 commented 1 year ago

upd: benchmark results.

Wow supercool!

I am still conservative with direct supporting profiling workload based on pyperformance suite but I am open to improving the current PGO and LTO for better performance. (For example, thinLTO is fast but fullLTO based on GCC is slow if you don't pass the core count or auto flag) or we can create a new configuration for designating the pre-gen profile directory, which can be used for external profiled data.

rockdrilla commented 1 year ago

I'd suggest to using fixed core count rather than "auto" in -flto=X because I've seen a lot of times situation where (gnu) make launches up to N (in case of make -j N) jobs with gcc and each (!) gcc spawned up to N processes. It's confusing me a lot but container runtime confuses whole build process even more when running in a container with limited core count.

upd: you may use -fprofile-dir=path flag with gcc in order to separate PGO data from build directory.

corona10 commented 1 year ago

I'd suggest to using fixed core count rather than "auto" in -flto=X because I've seen a lot of times situation where (gnu) make launches up to N (in case of make -j N) jobs with gcc and each (!) gcc spawned up to N processes

Okay, I agree with you. Let's pile the issue on the CPython and discuss the better way to solve it. I prefer that we can use seamless ways to support it.