As stated in https://github.com/p4lang/p4c/issues/4872 we are seeing high peak memory usage during def-use run. def-use creates lots of temporary objects and expects GC to clean them. However, it seems this does not happen reliably and therefore the peak memory usage could be very high. Especially given that def-use internal state objects (AllDefinitions) could be quite large. This PR tries to improve this situation in many aspects:
Shrink the lifetime of transient objects (this seems to be enough to reduce the memory consumption in the downstream testcase from 16+ Gb down to 2 Gb)
Reduce the amount of leaks - less work for GC and in many cases less allocation as we are allocating transient things on stack
Some improvements in internal structures as well
As a result:
The peak memory allocation in #4872 dropped down to 2 Gb (due to AllDefinitions being cleared)
We are seeing nice improvements in gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512 testcase:
Overall, the pass manager is pretty hostile to GC as it forces roots to be held for very long time. Here is why:
Consider a pass with large internal state (say def-use)
If the internal state is not explicitly cleared (and clear must be "deep" with memory release and zeroing) then it will only be cleared when the corresponding pass is destructed...
This only happens when PassManager itself is deallocated. Essentially – at the very end of the frontend.
The situation is more severe with midends / backends as these pass managers are there till the end of the p4c run.
As stated in https://github.com/p4lang/p4c/issues/4872 we are seeing high peak memory usage during def-use run. def-use creates lots of temporary objects and expects GC to clean them. However, it seems this does not happen reliably and therefore the peak memory usage could be very high. Especially given that def-use internal state objects (
AllDefinitions
) could be quite large. This PR tries to improve this situation in many aspects:As a result:
AllDefinitions
being cleared)gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512
testcase:gtestp4c-gc-main --gtest_filter=P4CParserUnroll.switch_20160512
gtestp4c-gc --gtest_filter=P4CParserUnroll.switch_20160512
gtestp4c-nogc-main --gtest_filter=P4CParserUnroll.switch_20160512
gtestp4c-nogc --gtest_filter=P4CParserUnroll.switch_20160512
But notice that GC takes 1.5 seconds out of overall 4.5 seconds runtime (!)
After:
Notice that # of allocations reduced from 27.6M down to 25.7M, the peak memory consumption also reduced from 3.22 GB down to 3.15 Gb