test-run always reports a single test status for a test with TAP13 output

ligurio commented 1 year ago

imagine we have a test with TAP13 output:

[0] ~/sources/MRG/tarantool$ cat test/box-tap/sample.test.lua 
#!/usr/bin/env tarantool

local report = [[
TAP version 13
1..3
ok 1
ok 2
ok 3
]]
print(report)

test-run reports only a single test status, even we have a number testcases in a test:

[0] ~/sources/MRG/tarantool$ ./test/test-run.py box-tap/sample.test.lua 

<snipped>

======================================================================================
WORKR TEST                                            PARAMS          RESULT
---------------------------------------------------------------------------------
[001] box-tap/sample.test.lua                                         [ pass ]
---------------------------------------------------------------------------------
Top 10 tests by occupied memory (RSS, Mb):

(Tests quicker than 0.1 seconds may be missed.)

[0] ~/sources/MRG/tarantool$ ./test/test-run.py box-tap/sample.test.lua --verbose

<snipped>

======================================================================================
WORKR TEST                                            PARAMS          RESULT
---------------------------------------------------------------------------------
[001] box-tap/sample.test.lua                                         
[001] TAP version 13
[001] 1..3
[001] ok 1
[001] ok 2
[001] ok 3
[001] 
[001] [ pass ]
---------------------------------------------------------------------------------
Top 10 tests by occupied memory (RSS, Mb):

(Tests quicker than 0.1 seconds may be missed.)

It would be convenient to report all testcases in test-run output. Especially it would be useful when all TAP13 testcases are passed and some are not, a detailed report will give an overview of the status of testing.

By examples

How output looks now when test is passed

``` $ ./test/test-run.py box-tap/feedback_daemon.test.lua Detected build mode: Release ====================================================================================== WORKR TEST PARAMS RESULT --------------------------------------------------------------------------------- [001] box-tap/feedback_daemon.test.lua [ pass ] --------------------------------------------------------------------------------- Top 10 tests by occupied memory (RSS, Mb): * 25.8 box-tap/feedback_daemon.test.lua (Tests quicker than 0.1 seconds may be missed.) --------------------------------------------------------------------------------- Top 10 longest tests (seconds): * 0.16 box-tap/feedback_daemon.test.lua --------------------------------------------------------------------------------- Statistics: * pass: 1 ```

How output looks now when test is failed

``` Started ./test/test-run.py box-tap/feedback_daemon.test.lua ====================================================================================== WORKR TEST PARAMS RESULT --------------------------------------------------------------------------------- [001] box-tap/feedback_daemon.test.lua [ fail ] [001] Test failed! Output from reject file /tmp/t/rejects/box-tap/feedback_daemon.reject: [001] TAP version 13 [001] 1..31 [001] ok - initial check [001] ok - feedback received after reload [001] ok - feedback received after errinj [001] ok - feedback received after explicit sending [001] ok - no feedback after disabling [001] ok - feedback after start [001] ok - no feedback after stop [001] ok - feedback after start [001] ok - feedback after feedback send [001] ok - file is created [001] ok - data is equal [001] ok - feedback contains "os" key [001] ok - feedback contains "arch" key [001] ok - feedback contains "cgroup" key [001] ok - features field is present [001] ok - schema stats are present [001] ok - schema stats are empty at the moment [001] # features.schema [001] 1..13 [001] ok - memtx engine usage gathered [001] ok - vinyl engine usage gathered [001] ok - temporary space usage gathered [001] ok - local space usage gathered [001] ok - sync space usage gathered [001] ok - tree index gathered [001] ok - hash index gathered [001] ok - rtree index gathered [001] ok - bitset index gathered [001] ok - jsonpath index gathered [001] ok - jsonpath multikey index gathered [001] ok - functional index gathered [001] ok - functional multikey index gathered [001] # features.schema: end [001] ok - features.schema [001] ok - internal cache invalidates when schema changes [001] ok - on_reload_configuration reported [001] ok - election_mode option reported [001] ok - replication_synchro_quorum option reported [001] ok - memtx_use_mvcc_engine option reported [001] ok - Time of report generation is correct [001] 1..27 [001] ok - DELETE total is reported [001] ok - COMMIT total is reported [001] ok - SELECT total is reported [001] ok - ROLLBACK total is reported [001] ok - INSERT total is reported [001] ok - EVAL total is reported [001] ok - ERROR total is reported [001] ok - CALL total is reported [001] ok - BEGIN total is reported [001] ok - PREPARE total is reported [001] ok - REPLACE total is reported [001] ok - UPSERT total is reported [001] ok - AUTH total is reported [001] ok - EXECUTE total is reported [001] ok - UPDATE total is reported [001] ok - CONNECTIONS total is reported [001] ok - CONNECTIONS current is reported [001] ok - REQUESTS total is reported [001] ok - REQUESTS current is reported [001] ok - REQUESTS_IN_PROGRESS total is reported [001] ok - REQUESTS_IN_PROGRESS current is reported [001] ok - SENT total is reported [001] ok - REQUESTS_IN_STREAM_QUEUE total is reported [001] ok - REQUESTS_IN_STREAM_QUEUE current is reported [001] ok - STREAMS total is reported [001] ok - STREAMS current is reported [001] ok - RECEIVED total is reported [001] ok - feedback operation stats [001] ok - Time of report generation is correct [001] 1..27 [001] ok - DELETE total is reported [001] ok - COMMIT total is reported [001] ok - SELECT total is reported [001] ok - ROLLBACK total is reported [001] ok - INSERT total is reported [001] ok - EVAL total is reported [001] ok - ERROR total is reported [001] ok - CALL total is reported [001] ok - BEGIN total is reported [001] ok - PREPARE total is reported [001] ok - REPLACE total is reported [001] ok - UPSERT total is reported [001] ok - AUTH total is reported [001] ok - EXECUTE total is reported [001] ok - UPDATE total is reported [001] ok - CONNECTIONS total is reported [001] ok - CONNECTIONS current is reported ```

How, I suppose, output should look like

``` ====================================================================================== WORKR TEST PARAMS RESULT --------------------------------------------------------------------------------- box-tap/feedback_daemon.test.lua [ pass ] box-tap/feedback_daemon.test.lua [ pass ] box-tap/feedback_daemon.test.lua:"initial check" [ pass ] box-tap/feedback_daemon.test.lua:"feedback received after re.. [ pass ] box-tap/feedback_daemon.test.lua:"feedback received after er.. [ pass ] box-tap/feedback_daemon.test.lua:"feedback received after ex.. [ pass ] box-tap/feedback_daemon.test.lua:"no feedback after disablin.. [ pass ] box-tap/feedback_daemon.test.lua:"feedback after start" [ pass ] box-tap/feedback_daemon.test.lua:"no feedback after stop" [ pass ] box-tap/feedback_daemon.test.lua:"feedback after start" [ pass ] box-tap/feedback_daemon.test.lua:"feedback after feedback se.. [ pass ] box-tap/feedback_daemon.test.lua:"file is created" [ pass ] box-tap/feedback_daemon.test.lua:"data is equal" [ pass ] box-tap/feedback_daemon.test.lua:"feedback contains "os" key.. [ pass ] box-tap/feedback_daemon.test.lua:"feedback contains "arch" k.. [ pass ] box-tap/feedback_daemon.test.lua:"feedback contains "cgroup"" [ pass ] box-tap/feedback_daemon.test.lua:"features field is present" [ pass ] box-tap/feedback_daemon.test.lua:"schema stats are present" [ pass ] box-tap/feedback_daemon.test.lua:"schema stats are empty at.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"memtx en.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"vinyl en.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"temporar.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"local sp.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"sync spa.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"tree ind.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"hash ind.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"rtree in.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"bitset i.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"jsonpath.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"jsonpath.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"function.. [ pass ] box-tap/feedback_daemon.test.lua:"features.schema":"function.. [ pass ] box-tap/feedback_daemon.test.lua:"internal cache invalidates.. [ pass ] box-tap/feedback_daemon.test.lua:"on_reload_configuration re.. [ pass ] box-tap/feedback_daemon.test.lua:"election_mode option repor.. [ pass ] box-tap/feedback_daemon.test.lua:"replication_synchro_quorum.. [ pass ] box-tap/feedback_daemon.test.lua:"memtx_use_mvcc_engine opti.. [ pass ] ... ```

Totktonada commented 1 year ago

--verbose performs exactly this.

https://github.com/tarantool/test-run/blob/affa13f2740f401e540659b58ddef49e8df02607/lib/options.py#L126-L136

(The help message says 'to log', but actually the output goes to the terminal.)

When something fails we should show the test output automatically (even without --verbose), but there is a buffering problems here, see #119 and #383.

ligurio commented 1 year ago

Sure, I know about --verbose. It is not actually what I mean.

Imagine you run tests on CI, one of the luatest testcase is failed under test-run. You need carefully look at the log to understand what testcase has failed. It takes additional time.

Attached file (OcWgbW.txt) contains a real log of running a suite with luatest's tests under test-run.py. Test etcd-client/etcd_client_tls_integration_test.lua has failed. What testcase has failed there?

Totktonada commented 1 year ago

What testcase has failed there?

[001] not ok 2  integration.tls.test_basic_no_tls_client

The --verbose option shows the output from luatest -c.

https://github.com/tarantool/test-run/blob/c5fa909f0d926d954ed6c1ba885fc31b3380e5d0/lib/luatest_server.py#L37-L38

AFAIU, it would be convenient for you to get less output as luatest does without -c.

However,

There is a chance that a small amount of output will trigger the buffering problem (see above) and we actually won't see anything.
Sometimes the full luatest -c output is needed from CI.

There is another option and I guess it is what you proposed initially. Add a summary about test cases if one is failed (we anyway parse it from all the output). I like this idea.

ligurio commented 1 year ago

There is another option and I guess it is what you proposed initially. Add a summary about test cases if one is failed (we anyway parse it from all the output). I like this idea.

This issue is about printing testcases of TAP and luatest's tests. Not about summary or anything else. Feel free to submit another issues with your ideas.

Currently, test-run is a just a frontend above real tests, and it should not hide useful information from tests that it runs.

Totktonada commented 1 year ago

I asked Sergey to clarify the issue and now there is the 'by examples' section that makes it more clear. It is actually not about failed tests, but rather about all tests that produce TAP13 output (some unit, some app and all luatest tests).

Sergey also added more context about his user scenario. The problem is about waiting for information from a long running test, when it is run alone. There is no way to understand what is going on until the test passed or until a timeout is reached.

He also tells about his idea to show verbose information about failed test case, not about all test cases in given test.

(Please, correct me if I wrench some of your ideas. I re-read our chat log five times before sending this clarification and I hope I made it as accurate as possible.)

My opinion is under the line.

The discussion is about a proper balance between verbosity and conciseness.

Testing frameworks usually target large test suites and give minimalistic information about running tests while everything is going OK.

Examples of defaults:

luatest just print a period for each test.
pytest prints a test per line.
ctest prints a test per line.

test-run primarily aims tarantool's test suite, which has ~1200 test runs (each is usually a fraction of a second), and it prints a test per line while everything is OK. It looks reasonable.

This default shouldn't be changed in my opinion.

However, aside of the primary goal of a testing framework, there are a couple of secondary goals:

verbosity of a failure report (when a test fails);
convenience of interactive debugging (of one test or a small group of tests).

That's why fancy failure reports exist, why output verbosity options like luatest -v, pytest -v, ctest -V, ./test-run.py --verbose exist and why output capturing options like luatest -c, pytest -s exist.

There are areas for improvements for test-run in these regards.

However, as far as I understood, a discussion is not welcomed here. So, I'm done.

tarantool / test-run

test-run always reports a single test status for a test with TAP13 output #386

By examples