Closed krzema12 closed 3 years ago
@SerVB could you help me debug the issue with comparing box-tests-report.tsv
? Locally it works fine, and on GitHub it fails - see here.
Could you:
python/experiments/generate-box-tests-reports.main.kts --box-tests-report-path=box-tests-report.tsv
(new script, you have to check out my branch)diff -u <(less box-tests-report.tsv | cut -d$'\t' -f1-2,4) <(less python/experiments/box-tests-report.tsv | cut -d$'\t' -f1-2,4)
?
Works for me after running tests via both IDEA and command line:
I'm not sure why there are different results on CI.
Can you try running tests with --info
on CI to see more info about the exceptions? Maybe it will give us a clue. Another option that comes in mind is to upload some files to build artifacts, if we already have such files, it can be easier.
Thanks for the ideas! For starters I'll just https://github.com/krzema12/kotlin-python/pull/26/commits/3ba9cd4fb6d52111e049cb29ef981c34cd3d776c.
Comparing for a case where state calculated by CI is different than calculated by our machines:
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenInlineTestGenerated$CallableReference$AdaptedReferences > testInlineVararg Succeeded 0.749
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenInlineTestGenerated$CallableReference$AdaptedReferences > testInlineVararg Failed 0.676 PythonExecution TypeError: object of type 'int' has no len() 0.221
In failed-tests.txt
this test is failed, which is weird because the difference in state (failed/succeeded) should be also reflected there:
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenInlineTestGenerated$CallableReference$AdaptedReferences > testInlineVararg FAILED
The new Kotlin script parses XML test report files once and then infers some status (failed/succeeded). It's weird that for one output file it's failed, and for another it's passed. Let me check what the parsed collection contains: https://github.com/krzema12/kotlin-python/pull/26/commits/af9e05387bee22cc8cd2b03a0edb5019ae2de827
Comparing for example cases where test fail, but because of another reason:
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Ranges$ForInIndices > testKt13241_Array Failed 1.456 PythonExecution NameError: name 'js' is not defined 0.349
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Ranges$ForInIndices > testKt13241_Array Failed 0.545 KotlinCompilation java.util.NoSuchElementException: List is empty. 0.542
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Arrays > testKt7288 Failed 0.924 PythonExecution NameError: name 'js' is not defined 0.285
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Arrays > testKt7288 Failed 0.28 KotlinCompilation java.util.NoSuchElementException: List is empty. 0.276
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Arrays > testArrayInstanceOf Failed 1.006 PythonExecution NameError: name 'js' is not defined 0.294
org.jetbrains.kotlin.python.test.ir.semantics.IrPythonCodegenBoxTestGenerated$Arrays > testArrayInstanceOf Failed 0.298 KotlinCompilation java.util.NoSuchElementException: List is empty. 0.29
Another idea is that you don't have the last commit from default branch included here: maybe GH Actions are automatically rebasing or something. Could you try rebasing yourself and forcepushing here?
@SerVB that was it - thank you! I needed to regenerate the box-tests-report.tsv. Apparently GitHub rebases the PRs as you said. A bit weird :shrug:
TIL :)
But it makes sense – without it, the CI would fail after we merge this PR 😅
Yeah, but on the other hand, it makes it harder to iterate within a single PR in an isolated manner. Imagine that master
gets changes every hour that break your PR :laughing:
iterate within a single PR in an isolated manner
Turns out there is a solution: https://github.com/actions/checkout#checkout-pull-request-head-commit-instead-of-merge-commit
The goal is to provide more insights into how many tests failed and why, in a compact way. Having this new file, even with a simple processing in a spreadsheet (sorting by "Failure general reason", then by "Failure detailed reason") I could see some patterns already. When wee see that multiple test fail because of some reason, it will be easier to decide that this should be our next focus. Some tests have "Unknown" reason and I think some tests shouldn't be even run because they are coupled with Java. However, the number of these tests is so low (29) that I don't think we should bother.
Regarding verifying this file in CI, it's not possible to check it entirely because times may vary even between different test runs on the same machine. In this PR I compare only test names, statuses and general reasons - should be enough to ensure that the file was updated by the PR author.
By the way, I also simplified the way of generating
failed-tests.txt
, now it's also done by the Kotlin script.I'm planning to add a Jupyter notebook with some visualizations based on the data from this new TSV file. I had some issues with running Jupyter on my machine, so let me postpone this task to a separate PR.