mjansson / rpmalloc

Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C
Other
2.15k stars 185 forks source link

Interposing standard entry points broken on macOS #151

Closed michaeleisel closed 10 months ago

michaeleisel commented 4 years ago

In certain situations, my system (OS X 10.14) tries to free a pointer with rpfree that was allocated with the system allocator. These can occur depending on how rpmalloc is integrated (currently, librpmallocwrap.dylib seems broken, so I'm using homebrewed integration). There are other situations where this is helpful too, e.g. https://github.com/jemalloc/jemalloc/blob/7014f81e172290466e1a28118b622519bbbed2b0/src/zone.c#L135 . Is there some way to determine this currently? If not, it'd be great if there was.

mjansson commented 4 years ago

Short answer: no

Long answer: For performance reasons, rpmalloc does not track fully allocated spans of memory pages, the tracking is re-established when the owning thread triggers a free of a block.

This means that even walking all heaps in the process and checking each tracked span of pages we (which would be very slow and completely unusable as a safety in rpmalloc) would still not guarantee to be able to determine if a block is owned by rpmalloc or not.

Relying on magic numbers in headers is not good enough either IMO, as any block NOT owned by rpmalloc could at best contain random data matching the magic, or at worst trigger a segfault as you try to read an invalid memory page (since by nature, the magic number must be outside the memory block being checked).

rpmalloc is not aimed at providing a safety net for mistakes, it's aimed at providing the best possible performance.

mjansson commented 4 years ago

But if you could elaborate on the brokenness of the malloc wrapper I could look into fixing/improving that.

michaeleisel commented 4 years ago

It appears that dyld interposing doesn't work for anything besides pthread_create (if you run with DYLD_PRINT_INTERPOSING=1 you can see what I mean). It seems like there are internally defined functions named malloc, free that it's interposing with, instead of libc's malloc.

In any case, the zone-based approach that, e.g., jemalloc employs might be better. They seem to have put a lot of work into making it stable, and generally I think it's good to stay away from interposing when possible. I actually used jemalloc for all (non-hardened) processes with the help of launchd, and the system ran smoothly.

michaeleisel commented 4 years ago

But just FYI, here is how I was able to replace the allocator and get it working reasonably well: https://github.com/michaeleisel/jemalloc/blob/master/src/zone.c . This is in turn based on the address sanitizer's way of doing it.

mjansson commented 2 years ago

Could you check the interposing in the latest develop branch and see if it works better for you?