PGIS Integration into CaPI

This document is meant to track the decisions and issues arising from the PGIS migration to CaPI.

Background

CaPI enables the specification of composable selection pipelines for performance instrumentation. Like PIRA, it uses MetaCG as the backend for global CG analysis and metadata collection.

The idea is to migrate the PGIS selection mechanism into the CaPI project. Benefits:

Reduction of code duplication between the two projects.
Enabling coherent composition of PGIS with other selection mechanisms.

Migration Phases

Adding PGIS estimator phases into CaPI as selection modules

Estimators As a first step, focus on the static statement aggregration selection. Add support for ExtraP, load imbalance etc. later.

Explicit modeling of analysis requirements Selection modules should be able to specify required analysis passes. CaPI needs to make sure that the results are available and dependencies are resolved.

Storage of analysis results We need to make sure that the analysis results are stored in MetaCG in a consistent manner. Instead of storing some of the information into the nodes directly in special member variables, everything should be modeled as metadata.

CLI and selection options CaPI currently provides a very simple interface with limited options. In order to support better configurability, we need a more sophistaced option parsing system. We also need to figure out which options should be passed inside the selection specification and which over the CLI.

CaPI interface for PIRA

Instead of using PGIS directly, PIRA will interface with CaPI. This enables better control of the selection mechanism used in each iteration.

Selection DSL The selection DSL is currently focused on static selection without refinement. We may need to adjust the DSL to express modifications of the IC over multiple iterations.

Instumentation Currently, CaPI and PIRA use separate instrumentation backends. At the end, we aim to have a unified backend, possibly supporting multiple measurement APIs and instrumentation paradigms (e.g. static vs. dynamic).

Thanks @sebastiankreutzer for opening this issue. I went over your notes and they all make sense to me.

Regarding the migration, I agree that we, for now, focus on enabling the static statement aggregation within CaPI. It should already offer a good way to understand what are the requirements moving forward. As from our discussion, I think we should pay good attention when implementing the mechanism to model dependencies between phases. This is somewhat of a pain point in the current PGIS design.

Something we need to think down the road is that certain PGIS estimators, e.g., the load imbalance detection, are designed to work in an iterative fashion. So, they would not work as a one-shot experiment selector, but actually require to run iteratively. This is different from, e.g., the static statement aggregation, which in itself is not an iterative thing.

tudasc / MetaCG