psf / pyperf

Toolkit to run Python benchmarks
http://pyperf.readthedocs.io/
MIT License
799 stars 78 forks source link

Support reporting geometric mean by benchmark tags #132

Closed mdboom closed 2 years ago

mdboom commented 2 years ago

Addresses https://github.com/python/pyperformance/issues/208

This reports geometric mean organized by the tag(s) assigned to each benchmark. This will allow us to include benchmarks in the pyperformance suite that we don't necessarily want to include in "one big overall number" to represent progress.

vstinner commented 2 years ago

It seems like you want to add a new key in metadata. In that case, you should specify it in the doc: https://pyperf.readthedocs.io/en/latest/api.html#metadata

I dislike changing the default formatting to add "(all)". Can you try to omit "(all)"? For example, if no benchmark has tags, it's weird to display the magic "all" tag.

            +----------------------+---------------------+-----------------------+
            | Benchmark            | mult_list_py36_tags | mult_list_py37_tags   |
            +======================+=====================+=======================+
            | [1]*1000             | 2.13 us             | 2.09 us: 1.02x faster |
            +----------------------+---------------------+-----------------------+
            | [1,2]*1000           | 3.70 us             | 5.28 us: 1.42x slower |
            +----------------------+---------------------+-----------------------+
            | [1,2,3]*1000         | 4.61 us             | 6.05 us: 1.31x slower |
            +----------------------+---------------------+-----------------------+
            | Geometric mean (all) | (ref)               | 1.22x slower          |
            +----------------------+---------------------+-----------------------+
            | Geometric mean (bar) | (ref)               | 1.37x slower          |
            +----------------------+---------------------+-----------------------+
            | Geometric mean (foo) | (ref)               | 1.18x slower          |
            +----------------------+---------------------+-----------------------+

In this table, it's also not easy for me to understand which benchmarks are used to compute the geometric means for each tag, since benchmark tags are not listed. Would it make sense to list tags?

vstinner commented 2 years ago

Do you have real examples of tags on benchmarks? I mean what are real tag values?

vstinner commented 2 years ago

In this table, it's also not easy for me to understand which benchmarks are used to compute the geometric means for each tag, since benchmark tags are not listed. Would it make sense to list tags?

Another option is to render one table per tag: it would only list benchmarks matching this tag, and so the "geometric mean" final row would summarize the table. And there would always be a last table with all benchmarks.

mdboom commented 2 years ago

Do you have real examples of tags on benchmarks? I mean what are real tag values?

We're mostly doing this work in expectation of cleaning the tags up to be more useful. The motivation is so we don't overoptimize for microbenchmarks, of which there are currently many in the suite. There's further discussion of how we might use tags going forward.

Another option is to render one table per tag: it would only list benchmarks matching this tag, and so the "geometric mean" final row would summarize the table. And there would always be a last table with all benchmarks.

I like this idea. It also would resolve your other comment about 'all' being surprising in the untagged case.

mdboom commented 2 years ago

@vstinner: Do these changes work for you?

vstinner commented 2 years ago

Sadly, the tests fail:

- FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/pyperf/pyperf/pyperf/tests/mult_list_py36_tags.json'
mdboom commented 2 years ago

Sadly, the tests fail:

- FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/pyperf/pyperf/pyperf/tests/mult_list_py36_tags.json'

Sorry -- forgot to commit the new test files -- let's see how this works.

EDIT: I guess this needs @vstinner or someone to re-approve the CI run.