intel / p3-analysis-library

A library simplifying the collection and interpretation of P3 data.

https://intel.github.io/p3-analysis-library/

MIT License

7 stars 10 forks source link

Add draft of per-component example #58

Closed Pennycook closed 1 week ago

Pennycook commented 1 month ago

This new example demonstrates how to use per-component (per-kernel) timings to calculate per-component application efficiencies, and how to accurately calculate overall application efficiency when per-component timings are available.

Related issues

Blocked by #55, which is why this is in draft state. I'd like to add cross-references between examples once they're merged.

Proposed changes

Add a more complex example of using application efficiency, inspired by the SYCL CRK-HACC paper.

@swright87, @laserkelvin: One thing I am really unhappy about here is how complicated it is to extract the sum/min from the result of a pandas grouping and append it to the original data. I'm sure there must be a better way to do it than this, but functions like unstack, melt and pivot didn't seem to do what I wanted. If you have any suggestions for ways to simplify the sample here, please let me know.

Pennycook commented 3 weeks ago

@swright87, @laserkelvin - Now that #55 has been merged, this should be ready for review.

Pennycook commented 3 weeks ago

Might be worth better sign posting that when picking and choosing implementations per component that this is all in theory. In practice picking and choosing might not be trivial because of data structures, etc.

Good idea. I've had a go at adding something in 1733fdfe8c4b4e5bfd80f5377a9bc578e86e95fc. What do you think?

swright87 commented 2 weeks ago

Looks good to me :)

Pennycook commented 2 weeks ago

Generally is okay with me, but I think a "component" could stand to be more concretely defined: a single kernel, a library, or does it actually not matter?

I tried to keep it vague because I think it really doesn't matter.

In the HACC paper, we did this analysis at a kernel level primarily because that's where we had the flexibility -- we had multiple different kernel implementations that we could mix-and-match, and they had different P3 characteristics. But I think the same analysis would work if each "component" was actually something coarser grained; for example, you could swap out an iterative solver (consisting of multiple kernels) for a surrogate model. It's an extension of the flexibility of "problem": each kernel in the iterative solver is setting out to solve a sub-problem, and there may be multiple implementation choices at each stage; equally, the iterative solver itself is being used to solve a problem, and there may be multiple solutions.