Open HrvojeFER opened 3 years ago
mimalloc
is designed to have bounded worst-case (de)allocation times and has generally low latency (and it is being used by many large services that care a lot about the 99% worst-case latencies). However, it is not "real-time" as it eventually depends on the OS and threading etc. Now, having said that, if you reserve large- or huge- OS pages upfront then allocation/deallocation times should be very predictable.
Mimalloc is lock-free and all multi-threaded free
operations use only atomic operations and should not cause issues; there is no internal blocking. (Again, there might be external blocking: eventually mimalloc gets memory from the OS and now it depends on that implementation how much synchronization there is).
So, overall the current mimalloc guarantees may be enough for audio but eventually depends on other external factors as well (OS, threads etc). Hope this helps to decide :-)
(as an aside: there is a place where mimalloc does not guarantee progress: this is when decommitting segment memory, it needs to ensure that at such time, no other thread tries to reclaim abandoned pages and uses a form of locking to guarantee that and avoid the ABA problem. This essentially implements a form of spin-lock using primitive atomic operations. Now, this is a rare operation so it seems ok in practice (and definitely for your situation as huge pages cannot be decommitted) but just wanted to document it here :-))
Thank you very much for the elaborate reply. I am glad that mimalloc is lock-free so that it is worth it to at least try it in my application. I will try and create an experimental branch in my project and see how things go with huge page sizes. Maybe I can find a way to limit the total amount of memory allocated and somehow disallow deallocation since most of the systems I'm targeting should have at least 1 GB of free memory at all times. Besides, I know a lot of applications that do this by default (most Java apps), so I don't think it's going to be terribly shocking to the clients. A lot of audio hugely depends on the OS and threading, since you have one thread that has to fire off every ~3 ms and fill a small buffer that is then sent to your audio card. If this doesn't happen, the glitches are unacceptable, so there can be no locks, allocation, waiting on the OS, waiting on a low priority thread, or waiting for another thread that does anything of the things mentioned. There are a lot of techniques to avoid all of this, but at least being able to "allocate" to a location that is already allocated from the OS's point of view would reduce a lot of the complexity. I guess the only way to know is to try :).
Yes, trying seems the only way; however, if you control for externals, the heap allocation with mimalloc should be very deterministic;
You can use MIMALLOC_RESERVE_HUGE_OS_PAGES=<N>
to easily get huge OS pages but on windows there is the drawback that it needs special client permissions. If that is not acceptable, ensuring that all memory is committed may already be good enough. You can do this with MIMALLOC_EAGER_COMMIT_DELAY=0
(and MIMALLOC_EAGER_COMMIT=1
) and for v2 turning off decommit (MIMALLOC_ALLOW_DECOMMIT=0
).
Perhaps there is still something lacking from mimalloc
that you may need: I think you would like to dedicate an area of (committed or huge-os-page) memory to a particular heap so it is only used by the audio thread (which would use the mi_heap_xxx
api to allocate explicitly from that heap) -- and not shared with other threads as well. This is currently not there -- all memory is available to everyone...
Thank you for the suggestion. Getting client permissions is totally ok since we also have to get their permission to use their audio devices, microphones, and such. Can reservation of huge os pages be done in runtime? There is some configuration clients have to do in order to use the application, so including an option of allocating more memory for advanced users would be greatly appreciated. If not, it is not a big issue since restarting the application in that state would take a minimal amount of time.
The idea of creating a heap that is dedicated to the audio thread sounds awesome. My first thought before bumping into mimalloc was to write an allocator just for the audio thread. No thread safety needed this way and even in perfect conditions of never waiting for the OS, it would still be a better solution because no thread-safe checks need to be performed. However, I have been thinking now about using mimalloc for shared memory and writing a custom allocator just for the audio thread. Audio buffers could sit there as they usually take up most of the space of whatever needs to be processed by the audio thread and they are almost never touched by any other thread. So, creating data structures that are being used by the audio thread on any thread and then allocating buffers on the audio thread heap by those data structures whenever needed seems to be the ideal solution.
@HrvojeFER This is really interesting. Another option might be to pre-allocate a few buffers for audio data and never free them. Every 3 ms you could fill the next one and pass it to the driver. You probably only need 2 or 3 if you're doing real-time audio, but could use more if you don't mind the latency of queueing them. The key in real-time systems on micro controllers is to never do dynamic allocation at all, and that's not a bad approach in other contexts either. It would also free you from allocator concerns raised in this discussion. Just a thought.
@phkahler This would be a great option if I knew the exact amount of buffers I'm going to need at all times in the application There are several buffers I need to keep because of user recordings and potentially because of processing. I only know the maximum amount of buffers I'm going to be able to use which is likely going to be OS/device dependent. I could come up with a better audio graph algorithm that can process audio with a known amount of memory ahead of time or find a resource about it online, but I don't really know any, I'm currently using a simple BFS algorithm that is appropriated to audio graphs (directed acyclic graphs). If you know any resources on this, it would be greatly appreciated! So, the viable solutions seem to entail using custom allocators like mimalloc, or writing my own and for now the best solution seems to use both for different purposes.
I noticed there was a mention of mimalloc being lock-free in #265, but I don't know if that still holds. The reason is that I would like to use this in an audio application with the huge page size setting so that allocation and deallocation of my data structures happen in deterministic time. I would like to know if it is still true that mimalloc is lock-free and if it is safe to assume that allocations and deallocations happen in deterministic time with the huge page size setting. Also, it would be helpful if that was said in the readme.