pulp-platform / snitch_cluster

An energy-efficient RISC-V floating-point compute cluster.
https://pulp-platform.github.io/snitch_cluster/
Apache License 2.0
52 stars 54 forks source link

Update Verilator to version 5.006 #158

Closed colluca closed 4 months ago

colluca commented 4 months ago

General

Details

zero9178 commented 4 months ago

I tested this PR in our downstream docker image and testsuite (https://github.com/opencompl/Quidditch/blob/main/runtime/toolchain/Dockerfile) in part to evaluate whether this would then also close https://github.com/pulp-platform/snitch_cluster/pull/148.

Few things sadly happened as soon as I used the newer SHA with the newer verilator release:

zero9178 commented 4 months ago

So the good news is that the new verilator model is a lot faster running a whole NN that leverages all 9 cores of a snitch cluster. Previously it took 5076 seconds; which has now been reduced to 2200 seconds, i.e. more than 50%.

I used VLT_NUM_THREADS=9, thinking that is a reasonable inherent parallism of the verilator model but a more optimal one may exist (our machine have 16 threads to play with). For GitHub actions we would more likely scale this back to 4 at most probably.

The correctness issue in https://github.com/pulp-platform/snitch_cluster/pull/148 seems to have also been fixed

colluca commented 4 months ago

@zero9178 thanks for the detailed testing and report!

The last results look promising, so I would close https://github.com/pulp-platform/snitch_cluster/pull/148 in favour of this. Did you manage to test this with GCC 11.1 in the end? Or were you able to fix the previous errors?

Regarding the log names this should be due to https://github.com/pulp-platform/snitch_cluster/pull/58, but anyways the traces shouldn't affect the correctness of any of the software tests.

zero9178 commented 4 months ago

@zero9178 thanks for the detailed testing and report!

The last results look promising, so I would close #148 in favour of this. Did you manage to test this with GCC 11.1 in the end? Or were you able to fix the previous errors?

I sadly haven't gotten around to doing this yet and I am currently short on time, but will do so as soon as possible. (Hopefully by next week when we're in Zurich :slightly_smiling_face:)

colluca commented 4 months ago

I sadly haven't gotten around to doing this yet and I am currently short on time, but will do so as soon as possible. (Hopefully by next week when we're in Zurich 🙂)

Looking forward to meeting you and we can surely discuss this in that setting as well :)

zero9178 commented 4 months ago

I believe I have found and fixed the issue that affected Clang and potentially the newer GCC versions as well. It seemed to have been a stack overflow after all. Culprit are the files/functions that verilator designates as Slow and therefore compiled with -O0. Since neither GCC nor Clang performaned any optimizations, code in these files had excessive stack usage. Adding OPT_SLOW="-O1" to my make invocation when building bin/snitch_cluster.vlt fixes it and made the clang built verilator also pass every test in the repo.

colluca commented 4 months ago

I believe I have found and fixed the issue that affected Clang and potentially the newer GCC versions as well. It seemed to have been a stack overflow after all. Culprit are the files/functions that verilator designates as Slow and therefore compiled with -O0. Since neither GCC nor Clang performaned any optimizations, code in these files had excessive stack usage. Adding OPT_SLOW="-O1" to my make invocation when building bin/snitch_cluster.vlt fixes it and made the clang built verilator also pass every test in the repo.

Great news! :tada: I will proceed with the merge then :)