Tracking test coverage and performance

nbraud commented 5 years ago

I know we kinda-discussed that in #59, but I thought it would be useful to resume that discussion & track it in a separate issue. (And if you feel it's inappropriate for me to bring it up again, please let me know and close the issue :3)

I think it would be pretty nice to have coverage and performance tracking, if only because we could answer questions like “how bad is the slowdown of #89” or “is this adequately tested” without having to reinvent a new way to get that data.

I totally agree with @pathunstrom that we should minimise the amount of tooling a user has to interact with, so it should happen automatically for them. I'd like to suggest doing it during CI, and automatically posting a message to the PR (if appropriate) with:

a link to the full report;
if coverage changed significantly, say it did (and by how much), congratulate the contributor on a positive change;
same for performance.

I would happily do the tooling & integration work, if there's consensus on it being desirable (and how it should behave). :)

AstraLuma commented 5 years ago

I know there's services that do the coverage stuff. IDK if there exists any for performance.

nbraud commented 5 years ago

@astronouth7303 I only know codecov and coveralls. I seem to recall both being somewhat annoying to integrate, but at least codecov supports a wide range of CI systems (coveralls would lock us into Travis...)

I don't know of any (free for open-source) performance-tracking software available as a service. There are self-hostable options, such as Google's Dana (which seems unmaintained since almost a year) or codespeed, but I assumed it would be preferable not to host services for this.

That's why I was suggesting making something which we can run directly in CI, and would upload a report (a human-readable one, and a machine-readable one) directly. perf already supports producing machine-readable outputs, and comparing 2 reports:

$ # Optional: apply system settings for more stable measurements
$ sudo --preserve-env=PATH,VIRTUAL_ENV python3 -m perf system tune

$ # in branch `benchmark`
$ ./tests/benchmark.py -o benchmark_master.json --rigorous
[...]
540.54s user 27.65s system 99% cpu 9:31.29 total

$ # in the target branch; first, setup a temporary branch and merge the benchmark (only necessary because the benchmark isn't merged yet)
$ git checkout de80e8d6eb5fc109264f33522fb042a487eaa8eb -b tmp_benchmark && git merge --no-edit benchmark
$ ./tests/benchmark.py -o benchmark_unstable_rotation.json --rigorous
[...]
537.98s user 27.61s system 99% cpu 9:28.60 total

$ python3 -m perf compare_to benchmark_*.json --table
python3 -m perf compare_to benchmark_*.json --table --min-speed 5   
+-----------+------------------+------------------------------+
| Benchmark | benchmark_master | benchmark_unstable_rotation  |
+===========+==================+==============================+
| __add__   | 1.87 us          | 2.00 us: 1.07x slower (+7%)  |
+-----------+------------------+------------------------------+
| __sub__   | 1.91 us          | 2.11 us: 1.11x slower (+11%) |
+-----------+------------------+------------------------------+
| __eq__    | 820 ns           | 934 ns: 1.14x slower (+14%)  |
+-----------+------------------+------------------------------+

Not significant (12): reflect; angle; dot; isclose; __neg__; convert; normalize; length; rotate; scale_by; scale_to; truncate

AstraLuma commented 5 years ago

Problem is, travis requires a place to upload to. :stuck_out_tongue:

But yeah, automated CI stuff would be a good.

nbraud commented 5 years ago

@astronouth7303 We can store the benchmark and coverage results in git notes, since they are small, which would be an extremely convenient way to track which commit they relate to. We would just need a pursuedpybot account to make an API key with push access.

For human-readable reports, we could just push those to Github pages, or S3, or whatever else is convenient. :3

nbraud commented 5 years ago

@astronouth7303 As I said when I opened the issue, if that seems like a reasonable solution, I can implement that.

nbraud commented 5 years ago

Ping?

AstraLuma commented 5 years ago

I suppose the next step is to discuss details? Or just do a quick prototype and be ready to iterate.

nbraud commented 5 years ago

Yeah, I was waiting for a confirmation it was a reasonable enough plan to go and prototype it.

ppb / ppb-vector

Tracking test coverage and performance #90