Generate detailed box tests report

krzema12 commented 3 years ago

The goal is to provide more insights into how many tests failed and why, in a compact way. Having this new file, even with a simple processing in a spreadsheet (sorting by "Failure general reason", then by "Failure detailed reason") I could see some patterns already. When wee see that multiple test fail because of some reason, it will be easier to decide that this should be our next focus. Some tests have "Unknown" reason and I think some tests shouldn't be even run because they are coupled with Java. However, the number of these tests is so low (29) that I don't think we should bother.

Regarding verifying this file in CI, it's not possible to check it entirely because times may vary even between different test runs on the same machine. In this PR I compare only test names, statuses and general reasons - should be enough to ensure that the file was updated by the PR author.

By the way, I also simplified the way of generating failed-tests.txt, now it's also done by the Kotlin script.

I'm planning to add a Jupyter notebook with some visualizations based on the data from this new TSV file. I had some issues with running Jupyter on my machine, so let me postpone this task to a separate PR.

krzema12 commented 3 years ago

@SerVB could you help me debug the issue with comparing box-tests-report.tsv? Locally it works fine, and on GitHub it fails - see here.

Could you:

Run the box tests on your blazing-fast machine
Run python/experiments/generate-box-tests-reports.main.kts --box-tests-report-path=box-tests-report.tsv (new script, you have to check out my branch)
Try the step which fails on GH - diff -u <(less box-tests-report.tsv | cut -d$'\t' -f1-2,4) <(less python/experiments/box-tests-report.tsv | cut -d$'\t' -f1-2,4)

?

SerVB commented 3 years ago

Works for me after running tests via both IDEA and command line:

I'm not sure why there are different results on CI.

Can you try running tests with --info on CI to see more info about the exceptions? Maybe it will give us a clue. Another option that comes in mind is to upload some files to build artifacts, if we already have such files, it can be easier.

krzema12 commented 3 years ago

Thanks for the ideas! For starters I'll just https://github.com/krzema12/kotlin-python/pull/26/commits/3ba9cd4fb6d52111e049cb29ef981c34cd3d776c.

krzema12 commented 3 years ago

Comparing for a case where state calculated by CI is different than calculated by our machines:

org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenInlineTestGenerated$CallableReference$AdaptedReferences > testInlineVararg Succeeded   0.749       
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenInlineTestGenerated$CallableReference$AdaptedReferences > testInlineVararg Failed  0.676   PythonExecution TypeError: object of type 'int' has no len()        0.221

In failed-tests.txt this test is failed, which is weird because the difference in state (failed/succeeded) should be also reflected there:

org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenInlineTestGenerated$CallableReference$AdaptedReferences > testInlineVararg FAILED

The new Kotlin script parses XML test report files once and then infers some status (failed/succeeded). It's weird that for one output file it's failed, and for another it's passed. Let me check what the parsed collection contains: https://github.com/krzema12/kotlin-python/pull/26/commits/af9e05387bee22cc8cd2b03a0edb5019ae2de827

Comparing for example cases where test fail, but because of another reason:

org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Ranges$ForInIndices > testKt13241_Array   Failed  1.456   PythonExecution NameError: name 'js' is not defined     0.349
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Ranges$ForInIndices > testKt13241_Array   Failed  0.545   KotlinCompilation   java.util.NoSuchElementException: List is empty.    0.542

org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Arrays > testKt7288   Failed  0.924   PythonExecution NameError: name 'js' is not defined     0.285
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Arrays > testKt7288   Failed  0.28    KotlinCompilation   java.util.NoSuchElementException: List is empty.    0.276

org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Arrays > testArrayInstanceOf  Failed  1.006   PythonExecution NameError: name 'js' is not defined     0.294
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Arrays > testArrayInstanceOf  Failed  0.298   KotlinCompilation   java.util.NoSuchElementException: List is empty.    0.29

SerVB commented 3 years ago

Another idea is that you don't have the last commit from default branch included here: maybe GH Actions are automatically rebasing or something. Could you try rebasing yourself and forcepushing here?

krzema12 commented 3 years ago

@SerVB that was it - thank you! I needed to regenerate the box-tests-report.tsv. Apparently GitHub rebases the PRs as you said. A bit weird :shrug:

SerVB commented 3 years ago

TIL :)

But it makes sense – without it, the CI would fail after we merge this PR 😅

krzema12 commented 3 years ago

Yeah, but on the other hand, it makes it harder to iterate within a single PR in an isolated manner. Imagine that master gets changes every hour that break your PR :laughing:

SerVB commented 3 years ago

iterate within a single PR in an isolated manner

Turns out there is a solution: https://github.com/actions/checkout#checkout-pull-request-head-commit-instead-of-merge-commit

krzema12 / kotlin-python

Generate detailed box tests report #26