Closed dipinhora closed 3 weeks ago
@SeanTAllen this PR should resolve the stress test issue created by #4531. Let me know if you prefer to rollback all the large chunk recycling related changes instead of temporarily disabling them like this PR currently does.
Let's see what happens with the stress tests tonight
FYI @dipinhora, you can see a number of failures that appear related in the ponyup actions history from the last couple of days.
@dipinhora ok, so after the large change, the ponyup macos nightly still hangs for apple silicon. That was using nightlies of tools to build (rather than release), so it is possible that the corral version that appears to be hanging hand't been updated to have the large chunk recycling turned off.
That ponyup build was supposed to be using release tools so i switched it. if need be we can switch it back to using nightly tools in order to diagnose further.
building the latest corral with the last ponyc on an apple silicon mac and trying to build ponyup while matching the nightly build options is also an option.
there were problems with some of last nights stress tests but some also passed. i'm not convinced that the failures last night are related to this (might have been some weird environmental stuff). i'm rerunning the stress test failures from last night at the moment. will update here again once those are done.
@dipinhora windows stress tests OOMed:
I think that perhaps this recycle attempt is problematic in practice for some reasonably standard things like the stress test and we should consider rolling it back.
@dipinhora Additional stress test failures. I think we should roll back. Thoughts?
@SeanTAllen i finally figured out the real root causes.. two bugs in the original PR.. one which impacts only large chunk recycling and one memory leak for both small and large chunk recycling (hence the OOMs)..
i'll open a PR with the fixes shortly.. up to you whether you want to give it another go after this latest fix or not but i'm fairly configure it'll be fixed after it..
@SeanTAllen PR #4535 has been opened to fix both of the bugs mentioned.
@dipinhora ill merge it once it passes CI
The implementation of actor heap large chunk recycling from #4531 too naive and results in actors wasting huge amounts of time related to large chunk recycling.
This commit effectively disables the large chunk recycling but doesn't undo the code changes at the moment because the expectation is that large chunk recycling will be re-enabled in the near future with an improved implementation.