haskell / vector

An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework .
Other
367 stars 139 forks source link

Benchmarks based on CPU counters #426

Closed Shimuuar closed 9 months ago

Shimuuar commented 2 years ago

We don't run benchmarks with any sort of regularity. In part because it's difficult to get repeatable result, it require dedicated PC and all measurement must be done on same hardware. So tracking performance regressions is hard. I run into this problem while working on size hints. Another way is to use CPU counters to count number of instructions. It's fast and hopefully reproducible and could possibly run on CI.

This PR is prototype. It sort of works, very rough on edges and right now isn't very deterministic. Apparently GC could kick in and skew results. I need to study this more and to figure out how to ensure repeatable measurements.

If everything works out I want to split it into separate package so far it's more convenient to develop it as part of vector.

P.S. -O1 could be up to 3× slower than -O2 in our benchmarks but sometimes it's on par.

Shimuuar commented 10 months ago

Now this PR contains benchmarks which use tasty-papi. Since they require libpapi and only work on linux they are protected by cabal flag and won't be built unless requested specifically. I played with benchmarks a bit and it seems that comparing performance for several GHC version is quite doable.

Sadly it's not possible to run them on github CI.

Shimuuar commented 10 months ago

@lehins, @Bodigrim I'm going to merge this PR tomorrow if you're OK with it

Bodigrim commented 10 months ago

I don't have a Linux machine to test, but otherwise have no objections. Looks interesting.

lehins commented 10 months ago

I've tried it on my setup with OpenSUSE and I get bunch of errors. Looks like CPU counters are not readily available on all Linux distros without extra configuration.

I am a little skeptical about adding such benchmark directly to vector that is so hard to execute. We can't build it on CI, nor can we easily run it locally. Maybe it would be better to have it as a separate package?

All                 
  listRank:          FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  rootfix:           FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  leaffix:           FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  awshcc:            FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  hybcc:             FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  quickhull:         FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  spectral:          FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  tridiag:           FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  mutableSet:        FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  findIndexR:        FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  findIndexR_naïve:  FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  findIndexR_manual: FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  minimumOn:         FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
  maximumOn:         FAIL
    Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
    CallStack (from HasCallStack):
      error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI

14 out of 14 tests failed (0.02s)
Benchmark algorithms-papi: ERROR
Completed 3 action(s).
Shimuuar commented 10 months ago

First of all we can (edc9083e264f3ce2a2ee3e24cd83bbe2657eaec4) and do build these benchmarks on CI. We just can't actually run them

I think splitting benchmarks means it will become even more complicated to run. Probably to the point where only I can run them. CPU counters are security sensitive with "specter" and all. So problems with access will happen. Hopefully they could be mitigated with better documentation.

But I never encountered such problems (nixos, fedora37) so I can;t debug them What is papi_avail output?

lehins commented 10 months ago

We just can't actually run them

Same case was for me. I was able to build them, just not run them. Which makes me question usability of the suite for your average contributor.

I think splitting benchmarks means it will become even more complicated to run.

I really don't think making a core Haskell package depend on some third party C library that is only usable on some small subset of operating systems is a good idea, even if it is not buildable by default. Splitting it out into a separate package is the only thing that comes to mind as a decent solution to this. Also, I don't understand why would it be more complicated to run? Benchmarks are only useful for people who are making changes to vector, so all it would take is cloning the repo and building it. We don't need to release those benchmarks on hackage.

Just to be clear, I am not against adding the benchmarks to the repo, because I do see value in such benchmarks. I am just against adding them to the main package. I'd rather see something like vector-papi in the repo that has a good documentation on what it takes to run those benchmarks.

Probably to the point where only I can run them.

That is already the case. Out of three maintainers you are the only one who can run them.

I can;t debug

I can't debug them either since I've never used papi. Since you asked, I'll post the relevant output, but honestly, I don't have enough time to debug this issue any further.

$ papi_avail
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------

================================================================================
  PAPI Preset Events
================================================================================
    Name        Code    Avail Deriv Description (Note)
...
--------------------------------------------------------------------------------
Of 108 possible events, 0 are available, of which 0 are derived.

No events detected!  Check papi_component_avail to find out why.
$ papi_component_avail
Available components and hardware information.
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------

Compiled-in components:
Name:   perf_event              Linux perf_event CPU counters
   \-> Disabled: Error libpfm4 no default PMU found
Name:   perf_event_uncore       Linux perf_event CPU uncore and northbridge
   \-> Disabled: No uncore PMUs or events found
Name:   sysdetect               System info detection component

Active components:
Name:   sysdetect               System info detection component
                                Native: 0, Preset: 0, Counters: 0

--------------------------------------------------------------------------------
Shimuuar commented 10 months ago

Or maybe I don't quite understand what exactly are you proposing. One problem is benchmarks are not accessible from outside of vector package. So we have to either copy them, or split all benchmarks into separate package

Since you asked, I'll post the relevant output Thanks! I'll look into it.

lehins commented 10 months ago

One problem is benchmarks are not accessible from outside of vector package.

We could export them as a public sub-library.

Shimuuar commented 9 months ago

This is third attempt. PAPI based benchmarks are placed into separate package. They still require specifying cabal flag in order to not break build for anyone who don't want to run them.

I also added cabal check for vector-stream