openhpc / submissions

OpenHPC Component Submissions Project
8 stars 2 forks source link

LIKWID #22

Closed TomTheBear closed 6 years ago

TomTheBear commented 7 years ago

Software Name

LIKWID - Like I knew what I am doing


Public URL

https://github.com/RRZE-HPC/likwid


Technical Overview

LIKWID is a set of tools for performance engineering and benchmarking.

Some of the tools are already available in OpenHPC like getting and printing the system topology (hwloc), controlling thread affinity (hwloc), performance monitoring (PAPI) and energy monitoring (PAPI with RAPL). Although hardware performance monitoring is provided through PAPI, LIKWID provides a different feature set.

Features that aren't yet covered by OpenHPC are assembly micro-benchmarking, manipulation of CPU features like prefetchers and controlling the CPUs' and Uncores' frequency.


Latest stable version number

4.2.0


Open-source license type

GPLv3


Relationship to component?

If other, please describe: make

Does the current build system support staged path installations? For example: make install DESTIR=/tmp/foo (or equivalent)


Does component run in user space or are administrative credentials required?


Does component require post-installation configuration.


If component is selected, are you willing and able to collaborate with OpenHPC maintainers during the integration process?


Does the component include test collateral (e.g. regression/verification tests) in the publicly shipped source?

The source package contains a folder test which contains some test applications for the different measurement modes, tests for the internal functions, command line argument checking and verification tests for hardware performance monitoring


Does the component have additional software dependencies (beyond compilers/MPI) that are not part of standard Linux distributions?

If Lua is installed on the system, LIKWID can be built against the installed one instead of the contained version. Currently it is not possible to use a system hwloc because the contained version has some features that are not available in the default hwloc. This will change in summer when the new hwloc major release comes out.


Does the component include online or installable documentation?

If available online, please provide URL. https://github.com/RRZE-HPC/likwid/wiki http://rrze-hpc.github.io/likwid/Doxygen/index.html https://raw.githubusercontent.com/wiki/RRZE-HPC/likwid/quick-reference/likwidquickreference.pdf


[Optional]: Would you like to receive additional review feedback by email?

- [x] yes - [ ] no
koomie commented 6 years ago

Question for @TomTheBear, is there a plan to add ARM architecture support in the future?

TomTheBear commented 6 years ago

There exists already an ARMv8 branch that contains an alpha version. I had no time since the last changes to get it in a state that would allow a merge. For POWER8 exists a pull request in a similar state.

koomie commented 6 years ago

Great, thx for the pointer. The Technical Steering Committee is reviewing this request and I should hopefully be able to follow up in a few weeks.

dmjacobsen commented 6 years ago

Hello,

The application indicates that LIKWID does not require administrative privilege, but my understanding is that LIKDWID requires read and write access to privileged MSRs. Can you please expand on the different methods you suggest to make these MSRs accessible?

Thanks, Doug

TomTheBear commented 6 years ago

Hi, There are three methods to access the MSR/PCI counters:

Both direct and access daemon allow only access to the availbale set of performance monitoring registers. Both methods provide access to all units and all registers. For perf_event some units or events might not work because the Kernel has no support for them. The main disadvantage of perf_event is that the kernel needs to support the architecture's performance monitoring units and in HPC the systems often have recent architectures but some older and well-tested kernel without support for the architecture.

craiggardner commented 6 years ago

I have additional questions about the three methods of access that you described. Which of the 3 is the default? In order to enable the higher privs of the "direct access" method, does the admin need to perform manual steps? If so, is that error prone? Likewise, what needs to happen to set the suid root for the "access daemon" approach? It seems that making a mistake there might be rather a big security problem.

TomTheBear commented 6 years ago

Currently the default is the access daemon because it provides access to all performance monitoring units without (in most cases) any additional effort. The suid-root bit is set during sudo make install, so no additional work by the admins is required. There are very few pitfalls (NFSv4 does not support suid-root applications) but in most cases it works out of the box. That's also the access method used in the Debian packages of LIKWID.

The direct access mode is difficult to make available for users and requires manual steps. It is commonly not enough to change the owner and permissions of the msr and pci device files (with udev rules) because the kernel does some additional checks (capabilities check). Generally, it is not recommended to give users direct access to the msr device files due to security reasons. It is possible to use sudo for likwid-perfctr to allow users/group only accessing the device files through LIKWID. A way to make the direct access method safe for user access are the msr-safe and csr-safe kernel modules developed at LLNL GitHub project. They don't perform the capabilities check and provides a way to specify a whitelist which registers are allowed to access to limit access to the security problematic registers.

koomie commented 6 years ago

Thank you for the submission and extra details on the follow up questions. The TSC has recommended acceptance of LIKWID via usage of the access daemon (with recipe info that it requires setuid).