Add a testing script and a github workflow

jarzec commented 8 months ago

Following this comment Here is my second attempt at a PR with workflow running regression tests.

It is based on a bash script that allows for running on different systems (Linux, MacOS, Windows - through git bash) using different compilers. I found a bash script to be the most flexible and easily understandable solution that adds no extra dependencies to the project. It should be fairly easy to maintain esp. given the number of people interested in the project 😉.

Below is the long story of how useful the script can be.

Running tests locally

The script should be executed where it is placed - in regression-tests dir (as shown i the YAML file). It can be used to run all the tests or only a subset specified through a command line argument as follows:

bash run-tests -c g++-10 - runs all the tests using g++12,
bash run-tests -c clang++-13 -t mixed-bounds-check.cpp2,pure2-stdio.cpp2 - runs only mixed-bounds-check.cpp2 and pure2-stdio.cpp2 using clang-13; this is very useful when working on particular test(s), e.g. adding a new one.

Running on Windows

As mentioned before, the script was written in a way to allow it to be simply run on Windows. This is shown at the very end of .github/workflows/regression-tests.yml. Here it comes for convenience:

"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvars64.bat" && ^
git config --local core.autocrlf false && ^
cd regression-tests && ^
bash run-tests.sh -c cl.exe

The first line sets up the environment. It has to be adapted for the VS version used. The second line is a convenience to avoid issues of line feeds. For the last line to work Git Bash has to be in the PATH. The above can be turned into a .bat script.

Updating tests

The script simply uses git to check for regression. In case of error it prints the output of git diff. This is very helpful for updating tests for machines you don't have access to. I am personally working on the WSL Ubuntu on my Windows. I was able to create some test update PRs, like https://github.com/hsutter/cppfront/pull/894, thanks to the script and older versions of the proposed workflow.

To update tests locally you just need to run the script and commit the modified files.
To update tests on systems you don't have access to you can rely on the GitHub workflow. E.g. At the time of writing mixed-bounds-check.cpp2 fails on main on MacOS as can be seen here. I was able to create https://github.com/hsutter/cppfront/pull/896 by copying the output of git diff printed by the script, pasting it into a patch file and applying the patch, as in:
```
$ cat > update-test.patch; git apply update-test.patch; rm update-test.patch;
<pasted diff output> + Ctrl+D
```
In fact diff outputs of multiple failing tests can be put in a single patch file. That is what I did for https://github.com/hsutter/cppfront/pull/894 😉.

Error reporting

The script reports regression errors in terms of non-empty git diff of all the respective generated output, code, stdout, ... as committed for the given compiler.
If such a file is generated during the run but is not checked into git an error is reported.
Compilation failures with errors matching the checked-in output files are not treated as error but are reported in the final summary. This allows for easier identification of compiler issues (shortcomings in terms of C++20 compliance) of each compiler.

Note: This PR adds two files regression-tests/test-results/pure2-stdio-with-raii.files and regression-tests/test-results/pure2-stdio.files. These contain the names of output files (one per line) created by the respective compiled binary. Those files allow the script to also check content of those output files that themselves need to be checked into git.

Adding/evaluating a new compiler

The script allows for easy generation of reference output for a given new compiler. To do that a new empty directory has to be added (e.g. regression-tests/test-results/clang-15), the compiler has to be included in the script (this has already been done for Clang-15 in line 87), and the script needs to be run (e.g. bash run-tests.sh -c clang++15 to continue the example). The script will report errors for every test but will generate all the files none the less. These files can be committed and a new compiler is added. Clang-15 has been used as reference in the example as all current tests compile correctly using the 15.0.7 version that can be installed on LTS Ubuntu. I will propose addition of tests for this compiler in another PR as it is the only one already available in Ubuntu 2024 that allows to correctly generate, compile and run all regression tests for C++20 (tested on WSL on Windows 11). The fact that I was able to quickly check that shows how useful the script is.

jarzec commented 8 months ago

Note that the MacOS run is indicated as failing for this PR. This is solved in https://github.com/hsutter/cppfront/pull/896, as mentioned above.

jarzec commented 8 months ago

Just a few spelling fixes in comments, and a link to a related issue at https://github.com/modern-cmake/cppfront/.

Thanks a lot, I committed all your suggestions.

hsutter commented 7 months ago

Thanks! I see the MacOS run is having trouble with a few:

Tests with failing compilation step:        7
    mixed-captures-in-expressions-and-postconditions.cpp2
    mixed-function-expression-and-std-for-each.cpp2
    mixed-function-expression-and-std-ranges-for-each-with-capture.cpp2
    mixed-function-expression-and-std-ranges-for-each.cpp2
    mixed-function-expression-with-pointer-capture.cpp2
    mixed-function-expression-with-repeated-capture.cpp2
    pure2-type-and-namespace-aliases.cpp2
Failed tests:        1
    mixed-bounds-check.cpp2

Is there some further tweaking needed for these?

jarzec commented 7 months ago

Thanks! I see the MacOS run is having trouble with a few:

Tests with failing compilation step:        7
    mixed-captures-in-expressions-and-postconditions.cpp2
    mixed-function-expression-and-std-for-each.cpp2
    mixed-function-expression-and-std-ranges-for-each-with-capture.cpp2
    mixed-function-expression-and-std-ranges-for-each.cpp2
    mixed-function-expression-with-pointer-capture.cpp2
    mixed-function-expression-with-repeated-capture.cpp2
    pure2-type-and-namespace-aliases.cpp2
Failed tests:        1
    mixed-bounds-check.cpp2

Is there some further tweaking needed for these?

In fact the last run of the github action that was shown in the PR is the one from roughly 2 weeks ago. At that stage the main branch had the issue indicated by the workflow that was fixed in https://github.com/hsutter/cppfront/pull/896. I've learned today that there is no option to ask GitHub to re-run a PR workflow when the base branch is updated. You @hsutter as the project owner can manually rerun the tests: you can click on Details above and you should see a Re-run all jobs button. I believe that will take the most recent version of the base branch.

Another way to have a rebuild is to push something into the PR branch itself, which is what I did. I have a workflow_dispatch event to the ones running the workflow. When this is in main it will be possible to run regression tests manually e.g. on a givet branch. In fact now that is another test failing on MacOS, namely pure2-return-tuple-operator.cpp2. If you look at the run details you can see that the reference files for this tests were not added for MacOS:

    Testing pure cpp2 code: pure2-return-tuple-operator.cpp2
        Generating Cpp1 code
        Compiling generated code
            The compilation message file is not tracked by git:
                test-results/apple-clang-14/pure2-return-tuple-operator.cpp.output
        Executing the compiled test binary
            Execution output file is not tracked by git:
                test-results/apple-clang-14/pure2-return-tuple-operator.cpp.execution

The trouble in this case is that the script does not print the content of the missing files so they cannot be updated just using the results of the script. I will try to fix that soon.

If in your question you were referring to Tests with failing compilation step: 7 I have added this diagnostic to keep track of issues related to compiler compliance with the standard. If i am not mistaken, currently the only compiler having reference data in regression_tests/test-results that is conformant enough to compile all the tests is GCC 13. All others have compilation error outputs committed as expected due to issues with compliance. The proposed bash script helps keeping track of such issues.

JohelEGP commented 7 months ago

If i am not mistaken, currently the only compiler having reference data in regression_tests/test-results that is conformant enough to compile all the tests is GCC 13.

No, GCC 13 fails for some UFCS tests. I think MSVC is the only one that can compile all of them.

jarzec commented 7 months ago

@hsutter The PR #919 by @filipsajdak solves the issue with the missing MacOS test files. In its current version the mentioned PR also removes some empty test files that are useful for testing. I have checked that with an updated version of the testing script and added a change request comment. When https://github.com/hsutter/cppfront/pull/919 is merged I will update this PR.

hsutter commented 7 months ago

Thanks! Merged #919

jarzec commented 7 months ago

@hsutter I have opened https://github.com/hsutter/cppfront/pull/920 to re-add the 4 files removed in https://github.com/hsutter/cppfront/pull/919. These are necessary for the automated testing to work.

jarzec commented 7 months ago

@hsutter I have updated the testing script and rebased my branch on your current main. As you can see the tests are all green.

hsutter commented 7 months ago

Thanks! Let's give it a try...

JohelEGP commented 7 months ago

This is a great start! Thank you very much, @jarzec!

hsutter commented 7 months ago

@jarzec, thanks again! Say, do you happen to know why just a couple of these show as failing on recent PRs? I can see from the details which files they are, but I haven't delved deep enough to figure out why, or how to fix it.

jarzec commented 7 months ago

@jarzec, thanks again! Say, do you happen to know why just a couple of these show as failing on recent PRs? I can see from the details which files they are, but I haven't delved deep enough to figure out why, or how to fix it.

@hsutter I could see that for most PRs one (or a few) of the dirs in test-resutls (related to the compiler(s) the author is using) would have all tests passing, while others would have tests failures. Authors would likely run and update all tests for their own compiler(s). However, it is difficult/impossible to update test for other platforms/compilers. In such a case a review of the failures reported by GItHub should still be performed as they might indicate actual problems with the PR. However, most likely the diffs for the other platforms/compilers can be just applied through patches. In this sense the GItHub workflow will be a great help for keeping all tests up to date.

The regression tests workflow is set up in such a way that tests are run for each push. This means that when you are working on branch for a new PR you can see the results of running tests on all platforms after every git push, i.e. even before a PR is open. This can be seen in the Actions section on GitHub: This way you can update tests also for other platforms and be sure that when you open a PR all tests will be green.

One last thing is that it is not possible to have GitHub re-run tests on a PR when the target branch gets updated (it only happens if the PR branch istelf is updated). For that reason I would suggest manually re-running tests on each PR just before it is merged. This can be done by clicking Details of the PR regression test build: and clicking Re-run all jobs in the action page that opens.

Maybe it would be a good idea to create a page on how to use the GihHub actions and the testing script itself. I think it can facilitate testing and test updates quite a bit.

hsutter / cppfront