p4lang / p4c

P4_16 reference compiler
https://p4.org/
Apache License 2.0
683 stars 446 forks source link

Reduce the number of memory allocations in def-use #4904

Closed asl closed 2 months ago

asl commented 2 months ago

As stated in https://github.com/p4lang/p4c/issues/4872 we are seeing high peak memory usage during def-use run. def-use creates lots of temporary objects and expects GC to clean them. However, it seems this does not happen reliably and therefore the peak memory usage could be very high. Especially given that def-use internal state objects (AllDefinitions) could be quite large. This PR tries to improve this situation in many aspects:

As a result:

  1. 2% runtime improvements with GC both on and off:
Command Mean [s] Min [s] Max [s] Relative
gtestp4c-gc-main --gtest_filter=P4CParserUnroll.switch_20160512 4.506 ± 0.088 4.365 4.659 1.02 ± 0.03
gtestp4c-gc --gtest_filter=P4CParserUnroll.switch_20160512 4.418 ± 0.086 4.194 4.502 1.00
Command Mean [s] Min [s] Max [s] Relative
gtestp4c-nogc-main --gtest_filter=P4CParserUnroll.switch_20160512 3.041 ± 0.024 3.004 3.099 1.01 ± 0.01
gtestp4c-nogc --gtest_filter=P4CParserUnroll.switch_20160512 2.996 ± 0.032 2.964 3.095 1.00

But notice that GC takes 1.5 seconds out of overall 4.5 seconds runtime (!)

  1. We reduced the amount of memory allocations by 10% as well. Before: image

After: image

Notice that # of allocations reduced from 27.6M down to 25.7M, the peak memory consumption also reduced from 3.22 GB down to 3.15 Gb

asl commented 2 months ago

Overall, the pass manager is pretty hostile to GC as it forces roots to be held for very long time. Here is why: