riscv-collab / riscv-gnu-toolchain

GNU toolchain for RISC-V, including GCC
Other
3.56k stars 1.17k forks source link

Questions about the test suite #1579

Open TommyMurphyTM1234 opened 1 month ago

TommyMurphyTM1234 commented 1 month ago

I still have some confusion about the test suite...

  1. In the case of a change/PR that needs to be tested what are the recommendations for running the test suite? For example - what simulator (Spike or QEMU - or maybe it depends on the nature of the changes requiring testing?), what toolchain(s) (e.g. bare-metal versus Linux/glibc, multilib or not, etc.), what arch/abi(s) (e.g. default rv64gc/lp64d versus rv32gc/ilp32d versus, say, rv32imac/ilp32 etc.)?

  2. On the modest hardware that I have access to (e.g. up to i5 gen 8) the test suite takes a very, very long time to run - so long, in fact, that it's not really practical to run it much if at all. Is there any way to accelerate the running of the test suite or are there any other options such as cloud based testing. GitHub actions etc.?

  3. Is there any friendly guide to understanding the basics of what the test suite does, how it works and how to deal with things like test results/reports and exclusion config files? Or is it simply a case of reading upstream info about the GCC test suite, DejaGnu etc.?

  4. (This may merit a separate issue?) When I cloned the latest riscv-gnu-tools repo master branch and tried to run the test suite I seem to have got incorrect results. Does this indicate a problem with how I am running it or are these failures "real"? See below for details.

git clone https://github.com/riscv-collab/riscv-gnu-toolchain
cd riscv-gnu-toolchain
./configure --prefix=`pwd`/installed-tools --with-sim=spike
make
make build-sim
make report-newlib 2>&1 | tee report-newlib.log

...

                === gcc Summary ===

# of expected passes            207055
# of unexpected failures        45
# of unexpected successes       2
# of expected failures          1438
# of unresolved testcases       4
# of unsupported tests          13175

(The test suite is still running after many, many hours so I cannot post the C++ test results summary or the full test log yet).

TommyMurphyTM1234 commented 1 month ago

Further results:

                === g++ Summary ===

# of expected passes            215739
# of unexpected failures        15
# of expected failures          1705
# of unresolved testcases       1
# of unsupported tests          11968
/home/user/spike-pk/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xg++  version 14.2.0 (g04696df0963)

make[3]: Leaving directory '/home/user/spike-pk/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc'
make[2]: Leaving directory '/home/user/spike-pk/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc'
make[1]: Leaving directory '/home/user/spike-pk/riscv-gnu-toolchain/build-gcc-newlib-stage2'
mkdir -p stamps/
date > stamps/check-gcc-newlib
/home/user/spike-pk/riscv-gnu-toolchain/scripts/testsuite-filter gcc newlib /home/user/spike-pk/riscv-gnu-toolchain/test/allowlist `find build-gcc-newlib-stage2/gcc/testsuite/ -name *.sum |paste -sd "," -`
                === g++: Unexpected fails for rv64imafdc lp64d medlow  ===
FAIL: c-c++-common/torture/builtin-clear-padding-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
UNRESOLVED: c-c++-common/torture/builtin-clear-padding-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  compilation failed to produce executable

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
 rv64imafdc/  lp64d/ medlow |    0 /     0 |    2 /     1 |      - |
make: *** [Makefile:1314: report-gcc-newlib] Error 1
TommyMurphyTM1234 commented 1 month ago

FWIW - another test run...

time make report-newlib 2>&1 | tee report-newlib.log

...

                === gcc Summary ===

# of expected passes            207085
# of unexpected failures        45
# of unexpected successes       2
# of expected failures          1438
# of unresolved testcases       4
# of unsupported tests          13199

...

                === g++ Summary ===

# of expected passes            215771
# of unexpected failures        14
# of expected failures          1705
# of unsupported tests          11978

...

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
 rv64imafdc/  lp64d/ medlow |    0 /     0 |    0 /     0 |      - |

real    354m48.053s
user    282m49.505s
sys     39m5.811s
TommyMurphyTM1234 commented 1 month ago

And a run with the Linux/glibc toolchain:

time make report-linux 2>&1 | tee report-linux.log

...

                === gcc Summary ===

# of expected passes            194597
# of unexpected failures        21700
# of unexpected successes       2
# of expected failures          1675
# of unresolved testcases       12
# of unsupported tests          13276

...

                === g++ Summary ===

# of expected passes            240348
# of unexpected failures        10439
# of expected failures          2625
# of unresolved testcases       28
# of unsupported tests          11858

...

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
 rv64imafdc/  lp64d/ medlow |21684 /  4163 |10455 /  2646 |18931 /  3190 |
make: *** [Makefile:1321: report-gcc-linux] Error 1

real    539m11.430s
user    424m49.504s
sys     59m47.288s
TommyMurphyTM1234 commented 1 month ago

Does anybody (@cmuellner or @kito-cheng perhaps?) know why I'm getting the results above with the latest of everything from this repo? The results do not seem to indicated "success" as far as I can tell. I built and ran everything again from scratch but got the same results as above so I can't see that I'm doing anything wrong at my end...

lazyparser commented 1 month ago

@pz9115 would you like to check this issue and provide some inputs?

TommyMurphyTM1234 commented 1 month ago

If you need me to provide any clarification or do any additional tests please let me know.

pz9115 commented 3 weeks ago

I still have some confusion about the test suite...

  1. In the case of a change/PR that needs to be tested what are the recommendations for running the test suite? For example - what simulator (Spike or QEMU - or maybe it depends on the nature of the changes requiring testing?), what toolchain(s) (e.g. bare-metal versus Linux/glibc, multilib or not, etc.), what (s) (e.g. default versus versus, say, etc.)?arch/abi``rv64gc/lp64d``rv32gc/ilp32d``rv32imac/ilp32
  2. On the modest hardware that I have access to (e.g. up to i5 gen 8) the test suite takes a very, very long time to run - so long, in fact, that it's not really practical to run it much if at all. Is there any way to accelerate the running of the test suite or are there any other options such as cloud based testing. GitHub actions etc.?
  3. Is there any friendly guide to understanding the basics of what the test suite does, how it works and how to deal with things like test results/reports and exclusion config files? Or is it simply a case of reading upstream info about the GCC test suite, DejaGnu etc.?
  4. (This may merit a separate issue?) When I cloned the latest repo branch and tried to run the test suite I seem to have got incorrect results. Does this indicate a problem with how I am running it or are these failures "real"? See below for details.riscv-gnu-tools``master
git clone https://github.com/riscv-collab/riscv-gnu-toolchain
cd riscv-gnu-toolchain
./configure --prefix=`pwd`/installed-tools --with-sim=spike
make
make build-sim
make report-newlib 2>&1 | tee report-newlib.log

...

                === gcc Summary ===

# of expected passes            207055
# of unexpected failures        45
# of unexpected successes       2
# of expected failures          1438
# of unresolved testcases       4
# of unsupported tests          13175

(The test suite is still running after many, many hours so I cannot post the C++ test results summary or the full test log yet).

Hi @TommyMurphyTM1234 The testsuites you run is a regression test,which is used to detect whether changes to gcc and other components have introduced new errors. Modifications in riscv-gnu-toolchain usually have no additional negative effects, since you do not change any submodule sourcecode directly.(unless you update the gcc module)

If you try some gcc modification, then regression testing is necessary. In comparison, gcc's regression testing time is much longer than other components (such as binutils). The more errors a regression test has, the longer it takes to run(I guess it's because some of the execution tests are not responding in the simulator).

So it is a good choice that only run riscv related testcases in gcc part. You can use RUNTESTFLAGS="riscv.exp" make report -j$(nproc) to short the test cost time.(Sometimes set RUNTESTFLAGS="rvv.exp" when change rvv part)

I usually run two types of regression tests, one using glibc and one using newlib, which use --with-arch=rv64gc as their base argument. It fine for mostly changes undenpend new sub-extension.

Hope this will help you:)

TommyMurphyTM1234 commented 3 weeks ago

Thanks @pz9115. I understand the general rationale for the test suite and acknowledge that my runs here are "artificial" and strictly unnecessary because I'm not actually testing before and after any changes to stuff like GCC, Binutils, C library etc.

However, my other questions still stand and have not been addressed. In particular, why does the test suite seem to fail and generate errors with the latest repo contents? Isn't that indicative if something anomalous? As far as I can see there isn't a baseline "successful" test suite run right now against which tests on a modified version of the toolchain can be compared.

pz9115 commented 2 weeks ago

Thanks @pz9115. I understand the general rationale for the test suite and acknowledge that my runs here are "artificial" and strictly unnecessary because I'm not actually testing before and after any changes to stuff like GCC, Binutils, C library etc.

However, my other questions still stand and have not been addressed. In particular, why does the test suite seem to fail and generate errors with the latest repo contents? Isn't that indicative if something anomalous? As far as I can see there isn't a baseline "successful" test suite run right now against which tests on a modified version of the toolchain can be compared.

For the failed cases, you can check the detail in the test log, which in the build-gcc-newlib-stage2/gcc/testsuite/gcc/gcc.log. I believe most of them are caused by execution fault, like incompatible ABI problem. Once we set the qemu arguments consistent with the toolchain, they we pass the test correctly.

TommyMurphyTM1234 commented 2 weeks ago

For the failed cases, you can check the detail in the test log, which in the build-gcc-newlib-stage2/gcc/testsuite/gcc/gcc.log.

OK - but what about stuff like this where make errors out? Surely that shouldn't happen?

/home/user/spike-pk/riscv-gnu-toolchain/scripts/testsuite-filter gcc newlib /home/user/spike-pk/riscv-gnu-toolchain/test/allowlist `find build-gcc-newlib-stage2/gcc/testsuite/ -name *.sum |paste -sd "," -`
                === g++: Unexpected fails for rv64imafdc lp64d medlow  ===
FAIL: c-c++-common/torture/builtin-clear-padding-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
UNRESOLVED: c-c++-common/torture/builtin-clear-padding-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  compilation failed to produce executable

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
 rv64imafdc/  lp64d/ medlow |    0 /     0 |    2 /     1 |      - |
make: *** [Makefile:1314: report-gcc-newlib] Error 1

...

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
 rv64imafdc/  lp64d/ medlow |21684 /  4163 |10455 /  2646 |18931 /  3190 |
make: *** [Makefile:1321: report-gcc-linux] Error 1

I believe most of them are caused by execution fault, like incompatible ABI problem. Once we set the qemu arguments consistent with the toolchain, they we pass the test correctly.

So the recommended target for testing is QEMU and not Spike? I thought that several changes/PRs were checked by running the test suite against Spike?

It seems to me that there is a severe lack of instructions/documentation on the specifics of how best/correctly to run the test suites against the riscv-gnu-toolchain and, unfortunately, by extensive experimentation I personally can't seem to clarify matters.