Closed Shimuuar closed 9 months ago
Now this PR contains benchmarks which use tasty-papi
. Since they require libpapi and only work on linux they are protected by cabal flag and won't be built unless requested specifically. I played with benchmarks a bit and it seems that comparing performance for several GHC version is quite doable.
Sadly it's not possible to run them on github CI.
@lehins, @Bodigrim I'm going to merge this PR tomorrow if you're OK with it
I don't have a Linux machine to test, but otherwise have no objections. Looks interesting.
I've tried it on my setup with OpenSUSE and I get bunch of errors. Looks like CPU counters are not readily available on all Linux distros without extra configuration.
I am a little skeptical about adding such benchmark directly to vector
that is so hard to execute. We can't build it on CI, nor can we easily run it locally. Maybe it would be better to have it as a separate package?
All
listRank: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
rootfix: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
leaffix: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
awshcc: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
hybcc: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
quickhull: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
spectral: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
tridiag: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
mutableSet: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
findIndexR: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
findIndexR_naïve: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
findIndexR_manual: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
minimumOn: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
maximumOn: FAIL
Exception: PAPI: Failed to add counter TOT_INS: Component containing event is disabled [-25]
CallStack (from HasCallStack):
error, called at ./Test/Tasty/PAPI.hs:267:9 in tasty-papi-0.1.2.0-LaShQvKfvJhJyvK9qqsrr2:Test.Tasty.PAPI
14 out of 14 tests failed (0.02s)
Benchmark algorithms-papi: ERROR
Completed 3 action(s).
First of all we can (edc9083e264f3ce2a2ee3e24cd83bbe2657eaec4) and do build these benchmarks on CI. We just can't actually run them
I think splitting benchmarks means it will become even more complicated to run. Probably to the point where only I can run them. CPU counters are security sensitive with "specter" and all. So problems with access will happen. Hopefully they could be mitigated with better documentation.
But I never encountered such problems (nixos, fedora37) so I can;t debug them What is papi_avail
output?
We just can't actually run them
Same case was for me. I was able to build them, just not run them. Which makes me question usability of the suite for your average contributor.
I think splitting benchmarks means it will become even more complicated to run.
I really don't think making a core Haskell package depend on some third party C library that is only usable on some small subset of operating systems is a good idea, even if it is not buildable by default. Splitting it out into a separate package is the only thing that comes to mind as a decent solution to this. Also, I don't understand why would it be more complicated to run? Benchmarks are only useful for people who are making changes to vector
, so all it would take is cloning the repo and building it. We don't need to release those benchmarks on hackage.
Just to be clear, I am not against adding the benchmarks to the repo, because I do see value in such benchmarks. I am just against adding them to the main package. I'd rather see something like vector-papi
in the repo that has a good documentation on what it takes to run those benchmarks.
Probably to the point where only I can run them.
That is already the case. Out of three maintainers you are the only one who can run them.
I can;t debug
I can't debug them either since I've never used papi. Since you asked, I'll post the relevant output, but honestly, I don't have enough time to debug this issue any further.
$ papi_avail
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------
================================================================================
PAPI Preset Events
================================================================================
Name Code Avail Deriv Description (Note)
...
--------------------------------------------------------------------------------
Of 108 possible events, 0 are available, of which 0 are derived.
No events detected! Check papi_component_avail to find out why.
$ papi_component_avail
Available components and hardware information.
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------
Compiled-in components:
Name: perf_event Linux perf_event CPU counters
\-> Disabled: Error libpfm4 no default PMU found
Name: perf_event_uncore Linux perf_event CPU uncore and northbridge
\-> Disabled: No uncore PMUs or events found
Name: sysdetect System info detection component
Active components:
Name: sysdetect System info detection component
Native: 0, Preset: 0, Counters: 0
--------------------------------------------------------------------------------
Or maybe I don't quite understand what exactly are you proposing. One problem is benchmarks are not accessible from outside of vector package. So we have to either copy them, or split all benchmarks into separate package
Since you asked, I'll post the relevant output Thanks! I'll look into it.
One problem is benchmarks are not accessible from outside of vector package.
We could export them as a public sub-library.
This is third attempt. PAPI based benchmarks are placed into separate package. They still require specifying cabal flag in order to not break build for anyone who don't want to run them.
I also added cabal check
for vector-stream
We don't run benchmarks with any sort of regularity. In part because it's difficult to get repeatable result, it require dedicated PC and all measurement must be done on same hardware. So tracking performance regressions is hard. I run into this problem while working on size hints. Another way is to use CPU counters to count number of instructions. It's fast and hopefully reproducible and could possibly run on CI.
This PR is prototype. It sort of works, very rough on edges and right now isn't very deterministic. Apparently GC could kick in and skew results. I need to study this more and to figure out how to ensure repeatable measurements.
If everything works out I want to split it into separate package so far it's more convenient to develop it as part of vector.
P.S.
-O1
could be up to 3× slower than-O2
in our benchmarks but sometimes it's on par.