ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language
http://www.ponylang.io
BSD 2-Clause "Simplified" License
5.73k stars 415 forks source link

Disable actor heap large chunk recycling #4534

Closed dipinhora closed 3 weeks ago

dipinhora commented 3 weeks ago

The implementation of actor heap large chunk recycling from #4531 too naive and results in actors wasting huge amounts of time related to large chunk recycling.

This commit effectively disables the large chunk recycling but doesn't undo the code changes at the moment because the expectation is that large chunk recycling will be re-enabled in the near future with an improved implementation.

dipinhora commented 3 weeks ago

@SeanTAllen this PR should resolve the stress test issue created by #4531. Let me know if you prefer to rollback all the large chunk recycling related changes instead of temporarily disabling them like this PR currently does.

SeanTAllen commented 3 weeks ago

Let's see what happens with the stress tests tonight

SeanTAllen commented 3 weeks ago

FYI @dipinhora, you can see a number of failures that appear related in the ponyup actions history from the last couple of days.

SeanTAllen commented 3 weeks ago

@dipinhora ok, so after the large change, the ponyup macos nightly still hangs for apple silicon. That was using nightlies of tools to build (rather than release), so it is possible that the corral version that appears to be hanging hand't been updated to have the large chunk recycling turned off.

That ponyup build was supposed to be using release tools so i switched it. if need be we can switch it back to using nightly tools in order to diagnose further.

building the latest corral with the last ponyc on an apple silicon mac and trying to build ponyup while matching the nightly build options is also an option.

there were problems with some of last nights stress tests but some also passed. i'm not convinced that the failures last night are related to this (might have been some weird environmental stuff). i'm rerunning the stress test failures from last night at the moment. will update here again once those are done.

SeanTAllen commented 3 weeks ago

@dipinhora windows stress tests OOMed:

I think that perhaps this recycle attempt is problematic in practice for some reasonably standard things like the stress test and we should consider rolling it back.

SeanTAllen commented 3 weeks ago

@dipinhora Additional stress test failures. I think we should roll back. Thoughts?

dipinhora commented 3 weeks ago

@SeanTAllen i finally figured out the real root causes.. two bugs in the original PR.. one which impacts only large chunk recycling and one memory leak for both small and large chunk recycling (hence the OOMs)..

i'll open a PR with the fixes shortly.. up to you whether you want to give it another go after this latest fix or not but i'm fairly configure it'll be fixed after it..

dipinhora commented 3 weeks ago

@SeanTAllen PR #4535 has been opened to fix both of the bugs mentioned.

SeanTAllen commented 3 weeks ago

@dipinhora ill merge it once it passes CI