Open zcbenz opened 1 week ago
I wonder why it's so slow / if there is a way to asynchronously free resources? Indeed I've noticed in the past that the program can take a while to exit when you are holding a lot of RAM. Leaking would be fairly easy ... just avoid releasing buffers in the buffer cache when the allocator is destroyed. Do we lose anything from doing that?
Memory allocators are slow. Doing it asynchronously does not help here because the program still has to wait for tasks to finish before exiting.
Leaking memory on exit is safe and standard behavior, for example when closing a Chrome tab all DOM elements are just leaked you can rely on OS safely collecting all the RAM used.
One danger of leaking memory on close is that in the future it makes things like ValGrind less useful when chasing a runtime memory leak.
You can tell sanitizers a specific leak is expected. Chromium explicitly leaks memory for static objects everywhere (base::NoDestructor) and still make use of various sanitizers.
I'm not opposed to this. I can mark it as an enhancement. @zcbenz I'm not sure if you are planning to send a PR. Happy to take a look if so / try it out.
I will send a PR sometime later this month if no one else had worked on it.
Memory allocator is declared static and will free memory on exit when process quit gracefully:
https://github.com/ml-explore/mlx/blob/9814a2ae120385bc903d059a317233fa1be3bcef/mlx/backend/metal/allocator.cpp#L244-L247
Which can take more than 30 seconds after inferencing a LLM:
How do you think if we just leak the memory on exit to save the time?