python / cpython

The Python programming language
https://www.python.org
Other
62.43k stars 29.97k forks source link

Enhanced cPython profiler with high-resolution timer #46534

Closed 5996bd1d-e940-4043-9218-45ba363184f6 closed 5 years ago

5996bd1d-e940-4043-9218-45ba363184f6 commented 16 years ago
BPO 2281
Nosy @gpshead, @abalkin, @pitrou, @kristjanvalur, @vstinner, @avassalotti, @methane
Files
  • hires_lsprof.tgz
  • hires_lsprof2.tgz
  • _lsprof2.6.diff: diff vs _lsprof.c rev 59564 for 2.6a1
  • _lsprof3.0.diff: diff vs _lsprof.c rev 59564 for 3.0a3
  • hires_lsprof3.tgz: use hires timer also on MacOS X, Solaris, HP-UX
  • hires_lsprof4.tgz: use hires timer on Linux as well
  • hpTime.c: standalone test program
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['extension-modules', 'performance'] title = 'Enhanced cPython profiler with high-resolution timer' updated_at = user = 'https://bugs.python.org/MrJean1' ``` bugs.python.org fields: ```python activity = actor = 'methane' assignee = 'none' closed = True closed_date = closer = 'methane' components = ['Extension Modules'] creation = creator = 'MrJean1' dependencies = [] files = ['9665', '9668', '9812', '9813', '14000', '14011', '14012'] hgrepos = [] issue_num = 2281 keywords = ['patch'] message_count = 23.0 messages = ['63486', '63487', '63511', '64314', '64332', '83551', '88004', '88006', '88012', '88018', '88047', '88048', '88622', '95728', '105966', '105971', '105972', '105977', '113842', '170942', '170944', '170975', '339976'] nosy_count = 9.0 nosy_names = ['gregory.p.smith', 'belopolsky', 'pitrou', 'kristjan.jonsson', 'vstinner', 'alexandre.vassalotti', 'MrJean1', 'nirai', 'methane'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'performance' url = 'https://bugs.python.org/issue2281' versions = ['Python 3.2'] ```

    5996bd1d-e940-4043-9218-45ba363184f6 commented 16 years ago

    Attached is a modified version of the cPython profiler file Modules/_lsprof.c using a high-resolution timer where available.

    The enhancement has been tested on 32- and 64-bit Linux (x86 and x86_64) and on 32-bit MacOS X Tiger (Intel) and Panther (PPC).

    No other platforms have been tested but as before the profiler will fallback to using gettimeofday() on non-Windows version, except the 64- bit PPC build will issue a compile-time warning.

    5996bd1d-e940-4043-9218-45ba363184f6 commented 16 years ago

    This enhancement applies to Python 2.5.2 only.

    5996bd1d-e940-4043-9218-45ba363184f6 commented 16 years ago

    Attached are 2 Modules/_lsprof.c files, one for Python 2.5.2 and 2.6a1 and the other for Python 3.0a3. Discard the previously posted one.

    Both contain the same enhancements to use the high-resolution timer where available.

    These versions catch wrap around of the timer and clock and adjust accordingly. In the hpTimerUnit function only and not for profile times.

    Lastly, instead of malloc and free functions PyObject_MALLOC and PyObject_FREE are called making profiler memory usage the same as for other objects created.

    avassalotti commented 16 years ago

    I don't think this should be added to 2.5. Only bugfixes are admissible to the backporting process (see PEP-6).

    Finally, could you post the diff of your changes as described at http://www.python.org/dev/patches/. Thanks!

    5996bd1d-e940-4043-9218-45ba363184f6 commented 16 years ago

    Here are 2 forward diff files against _lspprof.c rev 59564. One _lsprof2.6.diff for Python 2.6a1 and _lsprof3.0.diff for 3.0.

    pitrou commented 15 years ago

    Interesting patch, but there are lots of spurious #if's and #ifdef's which could be simplified. Also, some changes look slightly unrelated (e.g. the switch from malloc/free to PyObject_Malloc/PyObject_Free).

    5996bd1d-e940-4043-9218-45ba363184f6 commented 15 years ago

    Attached is another version of the Modules/_lsprof.c source file.

    This _lsprof3.c uses the hires time base on Mac OS X, Solaris and HP-UX (plus MS Windows like before) and does not require any asm code.

    Also, there are no changes other than the very top of the file. The forward diff with rev 66700 from the trunk is included.

    5996bd1d-e940-4043-9218-45ba363184f6 commented 15 years ago

    Attached is yet another version of the Modules/_lsprof.c source, called _lsprof4.c.

    This one adds support for the hires timer on 32- and 64-bit Intel and PowerPC processors iff compiled with GNU C.

    With _lsprof4.c, the high precision profile is available on Linux, Mac OS X, Solaris, HP-UX and MS Windows.

    Included is the forward diff with _lsprof.c rev 66700 from the trunk.
    The changes are only at the very top and in the initialization function at the bottom.

    5996bd1d-e940-4043-9218-45ba363184f6 commented 15 years ago

    There were two problems with _lsprof4.c. Attached is the corrected file and the forward diff with _lsprof.c rev 6670.

    5996bd1d-e940-4043-9218-45ba363184f6 commented 15 years ago

    Here a small, standalone test program of the hires timer code from _lsprof4.c.

    The results from 10 different builds are included in the comment.

    5996bd1d-e940-4043-9218-45ba363184f6 commented 15 years ago

    Both the hires_lsprof4.tgz and hpTime.c files have been updated with a change in the PowerPC asm code.

    The previous implementation used a loop: label and that may cause duplicate symbols when compiled at optimization levels of -O3 and above with older GNU C compilers.

    5996bd1d-e940-4043-9218-45ba363184f6 commented 15 years ago

    Another test build and result has been added to the hpTime.c comment.

    5996bd1d-e940-4043-9218-45ba363184f6 commented 15 years ago

    Another thought on the hires timer to make the hires time and hires time units available as 2 other functions in the time module.

    For example, function time.ticks() returns the hires time stamp as an int. Function time.ticks2secs(t) converts a given number of ticks to seconds.

    To avoid duplicating the hires time code in both the time and profile modules, it would be necessary to move the hpTimer and hpTimerUnit functions to some place inside the Python core accessible for the time and profile modules. Perhaps to a new file, say Python/gethptime.c.

    That new file can handle other platform-specific idiosyncrasies with respect to hires time. In particular, it could implement a different (and better) way to determine the resolution of a hires tick, e.g. on Linux and BSD Unix.

    pitrou commented 14 years ago

    One issue with using timestamp counters is that their resolution varies with CPU speed, which is not necessarily constant during a whole run (because of power management). On the other hand I'm not sure it's really a problem.

    pitrou commented 14 years ago

    Nir, would you be interested in looking at this?

    pitrou commented 14 years ago

    Regarding the calibrating loop when using ASM timestamp counters, there's one problem if the thread gets preempted in the middle of the loop: the calibration will return incorrect results. Perhaps a way of avoiding this would be to call nanosleep(1 ns) just before the loop, which should relinquish the current timeslice.

    Another issue can appear when threads migrate from one CPU core to another, according to Wikipedia: “AMD processors up to the K8 core always incremented the time-stamp counter every clock cycle.[5] Thus, power management features were able to change the number of increments per second, and the values could get out of sync between different cores or processors in the same system.”

    Perhaps using clock_gettime(CLOCK_REALTIME or CLOCK_MONOTONIC) is better than using raw ASM timestamp counters, when available.

    pitrou commented 14 years ago

    Sorry, the calibration loop actually looks correct in case of preemption, since it simply divides the TSC delta by the elapsed physical time (and both continue increasing monotonously when another thread runs). It is still vulnerable to the thread migration problem, though.

    7b124e7b-b61d-49f6-9762-fd36b627e237 commented 14 years ago

    Nir, would you be interested in looking at this?

    yes, I'll take a look, but will take me a few days.

    abalkin commented 14 years ago

    Can someone post a diff against current py3k? I would like to take a look, but the files attached to this issue seem to be more than a year old.

    vstinner commented 12 years ago

    Python 3.3 has a new time.perf_counter() function. It may be possible to expose this function in C and then use it in _lsprof. See the PEP for the list of (OS/hardware) clocks used by this function:

    http://www.python.org/dev/peps/pep-0418/#time-perf-counter http://www.python.org/dev/peps/pep-0418/#time-monotonic

    vstinner commented 12 years ago

    Reading directly the CPU timestamp counter is not a good idea.

    One issue with using timestamp counters is that their resolution varies with CPU speed, which is not necessarily constant during a whole run (because of power management). On the other hand I'm not sure it's really a problem.

    There is another major issue: you have to synchronize the counter of all CPU cores. Windows XP didn't synchronize the counters, and it was a major issue on QueryPerformanceCounter. http://support.microsoft.com/?id=896256

    It is really better to rely on the OS rather than querying directly the hardward. The OS (well, most OS except Windows) handles all these minor "nits" and provide the best possible resolution.

    --

    Python 3.3 has a new time.perf_counter() function. It may be possible to expose this function in C ...

    Oh, I now remember why I didn't do this before: when clock_gettime() function is used in perf_counter(), the program or library must be linked to librt on most Linux distro. It was easier to only handle this correctly for the time module. We can do something similar on the _lsprof module.

    84401114-8e59-4056-83cb-632106c0b648 commented 12 years ago

    There is another major issue: you have to synchronize the counter of all >CPU cores. Windows XP didn't synchronize the counters, and it was a major >issue on QueryPerformanceCounter. http://support.microsoft.com/?id=896256

    I don't understand why you keep bashing Windows for this. It was actually a problem with CPU drivers. We encountered this problem for a number of our players and the fix was to get the latest intel/amd drivers. The issue you quote is a separate one. This appears to be the correct one: http://support.microsoft.com/kb/895980

    methane commented 5 years ago

    I think https://bugs.python.org/issue36575 fixed this.