hpsfoundation / tac

High Performance Software Foundation TAC
Apache License 2.0
7 stars 3 forks source link

[Project Proposal] HPCToolkit #18

Open blue42u opened 1 week ago

blue42u commented 1 week ago

1. Name of Project

HPCToolkit

2. Project Description

HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to GPU-accelerated supercomputers. By using statistical sampling of timers and hardware performance counters on CPUs, HPCToolkit collects accurate measurements of a program's CPU work, resource consumption, and inefficiency and attributes them to the full calling context in which they occur. By monitoring GPU operations, gathering instruction-level metrics within GPU kernels, and attributing the costs of GPU work to heterogeneous calling contexts, HPCToolkit provides insight into the performance of GPU-accelerated codes. HPCToolkit works with multilingual, fully optimized applications that are dynamically linked. HPCToolkit is designed for use on large parallel systems. HPCToolkit's presentation tools enable rapid analysis of a program's execution costs, inefficiency, and scaling characteristics both within and across nodes of a parallel system. HPCToolkit supports measurement and analysis of serial codes, threaded codes (e.g. pthreads, OpenMP), MPI, and hybrid (MPI+threads) parallel codes, as well as GPU-accelerated codes that offload computation to AMD, Intel, or NVIDIA GPUs.

3. Statement on Alignment with High Performance Software Foundation's Mission

HPCToolkit is and aims to be a best-in-class performance tool for leadership supercomputers. It is one of the only performance tools able to run at leadership scales with detailed instruction-level performance attribution. Its functionality rivals the performance tools provided by Nvidia, AMD, Intel, and Cray on their own hardware. These features make HPCToolkit a necessary piece of a future HPC ecosystem dominated by cloud and AI at scale.

HPCToolkit is committed to providing quality performance analysis for a wide range of languages and platforms, particularly targeting developers of large-scale HPC applications. HPSF provides HPCToolkit with a neutral home and safe stewardship for our stakeholders in government and academia, and opens HPCToolkit to future collaboration opportunities.

4. Project Website (please provide a link)

Project Website

5. Open Source License (please provide a link)

SPDX Identifier: BSD-3-Clause (considering a relicense to Apache-2.0)\ LICENSE.md

Data artifacts are licensed under the CDLA Permissive 2.0 license (SPDX: CLDA-Permissive-2.0).

6. Code of Conduct (please provide a link)

We adopt the generic LF Code of Conduct.

7. Governance Practices (please provide a link)

Project Governance

8. Two Sponsors from the High Performance Software Foundation's Technical Advisory Committee

Todd Gamblin and Christian Trott

9. What is the project's solution for source control?

GitLab.com, Git repositories under the @hpctoolkit group (e.g. hpctoolkit/hpctoolkit>).

10. What is the project's solution for issue tracking?

GitLab issues

11. Please list all external dependencies and their license

C/C++:

Java:

12. Please describe your release methodology and mechanics

HPCToolkit is released roughly semiyearly (summer and winter), although this is often adjusted due to customer needs. Releases are made as Git tags with corresponding GitLab releases and subsequently published as Spack package versions. Binary artifacts are produced automatically using Continuous Deployment practices (with minimal exceptions).

13. Please describe Software Quality efforts (CI, security, auditing)

All changes to the mainline must pass a series of automated tests and linter-style checks, run via GitLab CI. These tests cover major releases of 4 common Linux distributions (Ubuntu, RHEL, Fedora, SUSE Leap), multiple CPU architectures (amd64, aarch64, ppc64le), and multiple GPU architectures (CUDA/Nvidia, HIP/AMD). Builds include multiple GCC and Clang compiler versions.

We do not have security screening in place. This is an area we would like to improve under HPSF.

14. Please list the project's leadership team

The HPCToolkit Technical Steering Committee (\@hpctoolkit/tsc) is made of the following members:

15. Please list the project members with access to commit to the mainline of the project

16. Please describe the project's decision-making process

We implement consensus-based decisions among our maintainers/committers, and we will resort to a fair vote of the TSC when consensus is not reached. These discussions happen primarily in GitLab issues/MRs or internally among the team.

Merge requests (MRs) must be approved and merged by a committer with sufficient access, although the review itself may be delegated to another contributor or reviewed in an informal meeting.

17. What is the maturity level of your project?

We aim to join the HPSF as an Established stage project.

The Established stage characterizes our project well. We are looking to create a plan for continued support for our users. We have a very small developer community and wish to expand it by leveraging the experience at LF and HPSF. And we are working with the eventual goal of achieving a Core project status.

18. Please list the project's official communication channels

19. Please list the project's social media accounts

N/A

20. Please describe any existing financial sponsorships

Development on HPCToolkit is primarily funded from DOE grants and industry collaboration contracts via Rice University. The full list of sponsors is available on our website.

21. Please describe the project's infrastructure needs or requests

Criteria for Sandbox Stage

Criteria for Established Stage