ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language
http://www.ponylang.io
BSD 2-Clause "Simplified" License
5.72k stars 415 forks source link

Recycle actor heap chunks after GC instead of returning to pool #4531

Closed dipinhora closed 1 month ago

dipinhora commented 1 month ago

Before this commit, any unused chunks after actor heap garbage collection would be destroyed and returned to the memory pool immediately for reuse by the runtime or any actor.

This commit changes things so that instead of destroying and returning the chunks immediatelly, we assume the actor will likely need more memory as it runs more behaviors and keep the recently unused chunks around in case that happens. This is generally more efficient than destroying a chunk and getting a new one from the memory pool because both destorying a chunk and allocating a new one involve updating the pagemap for the chunk to indicate which actor owns the chunk. Updating the pagemap is an expensive operation which we can avoid if we recycle the chunks instead. The main drawback is that since actors will no longer return chunks to the memory pool immediately after a GC, the overall system might end up using more memory as any freed chunks can only be reused by the actor that owns them and the runtime and other actors can no longer reuse that memory as they previously might have been able to.

SeanTAllen commented 1 month ago

I see this will free any recycled chunks that aren't reused after 1 gc pass.

Can you help me work through how this will work in practice. If an actor never gets gc'd, this won't have any impact. If an actor only got gc'd once, then some unknown amount of memory would not be freed, and if the actor is gc'd more than once, the same would still apply, some amount of memory would continue to be held for recycling. Yes?

dipinhora commented 1 month ago

I see this will free any recycled chunks that aren't reused after 1 gc pass.

Can you help me work through how this will work in practice. If an actor never gets gc'd, this won't have any impact. If an actor only got gc'd once, then some unknown amount of memory would not be freed, and if the actor is gc'd more than once, the same would still apply, some amount of memory would continue to be held for recycling. Yes?

correct..

that last point is where there's possibility of tweaking things by having the actor return any chunks saved for recycling to the runtime memory pool instead of continuing to hold onto them when it blocks..

ponylang-main commented 1 month ago

Hi @dipinhora,

The changelog - changed label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do.

Release notes are added by creating a uniquely named file in the .release-notes directory. We suggest you call the file 4531.md to match the number of this pull request.

The basic format of the release notes (using markdown) should be:

## Title

End user description of changes, why it's important,
problems it solves etc.

If a breaking change, make sure to include 1 or more
examples what code would look like prior to this change
and how to update it to work after this change.

Thanks.

dipinhora commented 1 month ago

release notes added

SeanTAllen commented 1 month ago

@dipinhora first night after this was merged, all the stress tests failed.

I'm waiting to see what happens with tonight's.

dipinhora commented 1 month ago

@dipinhora first night after this was merged, all the stress tests failed.

I'm waiting to see what happens with tonight's.

@SeanTAllen i looked through all the logs and they all end with:

Error: The operation was canceled.

it doesn't seem like anything actionable (i.e. no crashes).

dipinhora commented 1 month ago

scratch that... looking at older stress test runs they all finished in under 30 mins - 1 hour and the new runs seem to time out after 6 hours.. something too look into..

dipinhora commented 1 month ago

@SeanTAllen #4534 has been opened to resolve the stress test issue.