make TAP output possible in ppx_inline_test

tcoopman commented 6 years ago

I came across this bug in ppx_inline_test with the mention to create a bug here. Didn't find a bug, so creating this one.

The output of ppx_inline_test could really be improved and using TAP would open a lot of options for formatting/reporting.

rgrinberg commented 6 years ago

If you'd like to work on this, then we'd be happy to guide you.

tcoopman commented 6 years ago

I don't have a lot of experience with OCaml, but with some guidance I'm willing to take a stab on it. I also can't make any promises time related, but this sounds interesting so let's give it a try.

How do we get started with this?

rgrinberg commented 6 years ago

Actually, I'm not fully aware of what @diml had in mind when he mentioned co-operation with the build system. Do we need a flag for the ppx_inline_test runner to output .tap output? Or do we just have the .tap output in a separate file?

ghost commented 6 years ago

My mental model is that while the test is running, it needs to continuously output a stream of feedback that is interpreted by another tool. Is that correct?

Currently, dune captures the output of all commands, in order to avoid mixing up the output of commands in parallel builds. So we need the build system to cooperate in order for the TAP output to be consumable in real time by another tool.

tcoopman commented 6 years ago

My mental model is that while the test is running, it needs to continuously output a stream of feedback that is interpreted by another tool. Is that correct?

Seems correct to me. You want to have continuous output while the tests are running.

Do we need a flag for the ppx_inline_test runner to output .tap output? Or do we just have the .tap output in a separate file?

The user should be able to choose if he wants to output it through a TAP tool or not, so probably a flag in dune? (I don't know the architecture of dune and inline tests, so not sure what the cleanest way would be).

ghost commented 6 years ago

BTW, do you have examples of TAP consumers? Just to see what we would get from adding TAP support.

tcoopman commented 6 years ago

See: https://www.node-tap.org/reporting/ and https://github.com/sindresorhus/awesome-tap#reporters

ghost commented 6 years ago

Thanks. In one run of dune runtest, Dune might run multiple test executables in parallel. Each of these executables might contain several tests that are executed sequentially. At which level would expect the granularity of the reports to be?

tcoopman commented 6 years ago

It seems that tap has some support for parallel running tests: https://www.node-tap.org/parallel/. The protocol supports the idea of subtests:

A subtest is an indented TAP stream that is a child of the current set of tests. It can be used to group test points together, consume the output of a TAP-producing child process, or run tests asynchronously.

So I guess that could be used for the multiple test executables?

ghost commented 6 years ago

IIUC this page correctly, the tests are run directly by the tap command. Is that correct?

My previous understanding was that supporting tap would work this way:

$ dune runtest | tap-consumer
<pretty output>

But I realise that this might not be the case. I don't know much about TAP, could explain a bit what it provides and how it is supposed to interact with the build system?

tcoopman commented 6 years ago

$ dune runtest | tap-consumer that's certainly possible. If you look at https://github.com/scottcorgan/tap-dot for example, you'll see similar things in the screenshots.

The node-tap page is an implementation of the tap protocol (both runner and output). This page is actual a better source: http://testanything.org/ Although digging a bit deeper, it seems that subtests are not yet part of the official protocol (but are part of this PR https://github.com/TestAnything/testanything.github.io/pull/36)

I'm actually not that familiar with the TAP protocol, but I used it and the output can be a lot nicer/cleaner that what you get at the moment with dune. Maybe we could think about a first (more simple) goal of making the output of dune runtest better without looking at TAP?

ghost commented 6 years ago

Yes, that seems reasonable, thanks for the pointers. But before looking more into this, have you looked at expectation testing? It seems to me that this is solving the same problem, but in an ever nicer way. This workflow is now completely supported by Dune.

tcoopman commented 6 years ago

I have looked and used expectation testing. Some things were missing for me though:

When you have both let%test and let%expect_test in the same file and the let%test fails, the expect tests are not run?
No clean output how many success and how many failures (if I remember correctly there is even no output when all the test pass, something like: 5 tests, 0 failures would be useful)
No colors (colored diffs would be great)

Maybe some of these are already possible and I didn't dive in the documentation deep enough.

ghost commented 6 years ago

For the first point, you can consider that classic inline tests (let%test) are completely superseded by expectation tests. One thing I'm planning to do in the long run is to make all inline tests be expectation tests. You can replace let%test name = expr by let%expect_test name = assert expr, that will provide a better experience.

The second point could indeed be improved.

Regarding colors, if you install patdiff Dune will automatically use it and it uses colors. patdiff is available in opam.

tcoopman commented 6 years ago

You can replace let%test name = expr by let%expect_test name = assert expr, that will provide a better experience.

I've tried this: let%expect_test "foo" = assert false but that doesn't give me a better experience?

(cd _build/default/test && .test.inline-tests/run.exe inline-test-runner test -source-tree-root .. -diff-cmd -)
File "test/account_test.ml", line 16, characters 0-39: fooo threw "Assert_failure test/account_test.ml:16:25".
  Raised at file "test/account_test.ml", line 16, characters 25-39
  Called from file "collector/expect_test_collector.ml", line 19, characters 8-12
  Re-raised at file "collector/expect_test_collector.ml", line 21, characters 31-38
  Called from file "collector/expect_test_collector.ml", line 251, characters 11-64
  Called from file "runtime-lib/runtime.ml", line 327, characters 8-12
  Re-raised at file "runtime-lib/runtime.ml", line 330, characters 6-13
  Called from file "runtime-lib/runtime.ml", line 343, characters 15-52
  Called from file "runtime-lib/runtime.ml", line 430, characters 52-83

FAILED 1 / 2 tests

I still don't see the other failed test in the output.

Regarding colors, if you install patdiff Dune will automatically use it and it uses colors. patdiff is available in opam.

I've just read about patdiff and tried it. That's already much better! Thanks. Some improvement might be that if there is not much difference between 2 lines that it actually highlights the actual differences between the lines as well.

ghost commented 6 years ago

I've tried this: let%expect_test "foo" = assert false but that doesn't give me a better experience?

Ah , indeed. We worked on ppx_expect to improve handling of uncaught exceptions, however this work is only in the development version and hasn't been released in opam yet.

I've just read about patdiff and tried it. That's already much better! Thanks. Some improvement might be that if there is not much difference between 2 lines that it actually highlights the actual differences between the lines as well.

It might be possible to configure patdiff a bit, though I don't know the details. Feel free to open a ticket on https://github.com/janestreet/patdiff.

tcoopman commented 6 years ago

Ah , indeed. We worked on ppx_expect to improve handling of uncaught exceptions, however this work is only in the development version and hasn't been released in opam yet.

Looking forward to this!

It might be possible to configure patdiff a bit, though I don't know the details. Feel free to open a ticket on https://github.com/janestreet/patdiff.

Opened this bug: https://github.com/janestreet/patdiff/issues/10

ELLIOTTCABLE commented 5 years ago

Since the thread got a little side-tracked — I'd still like to see actual realtime-output supported, so testing tools like Alcotest and ppx_inline_test could produce TAP.

My own usecase is tying together multiple testing-systems at the shell level, merging multiple TAP streams into one coherent test-suite output. For instance, piping Jest thru jest-json-to-tap, and then merging it with the OCaml-side testing-output using multi-tap and piping it into a debouncing filesystem-watcher or pretty-output formatter or aggregate statistics on test suites or all of the above.

Anyway, it's pretty standard across a bunch of languages and ecosystems (which has been a real blessing, when you're a polyglot like I!); I'd love to see it make inroads in the OCaml ecosystem. (=

rgrinberg commented 5 years ago

I think that the dune team is definitely in favor of this feature but none of us posses the expertise to complete this project. If there's an external contributor that would like to shepherd this project, then we'd be glad to help as much as we can.

ocaml / dune

make TAP output possible in ppx_inline_test #1227