Closed jonmeow closed 3 months ago
TCMalloc needs NumPossibleCPUsNocache
to be non-allocating, since we use it to size some data structures during our very first allocation.
In the past, we've run into issues where std::thread::hardware_concurrency
might allocate depending on C++ stdlib/libc implementation details. For example, libc++'s implementation seems to call sysconf
in many cases, which is async-unsafe for heap allocations.
If you know where sysconf
is sourcing the information from, it'd certainly be possible to directly fetch it in a way careful to not allocate.
I think we need the _SC_NPROCESSORS_CONF
case, since offlined CPUs can cause problems (see issue #188 ), which reads from /sys
as well: https://github.com/bminor/glibc/blob/32328a5a1461ff88c0b1e04954e9c68b3fa7f56d/sysdeps/unix/sysv/linux/getsysstats.c#L231
To confirm, are you okay with get_nprocs_conf
even though it's documented as unsafe? (note, this may reflect out-of-date documentation, it looks like the implementation changed a couple years ago)
Note, I'm happy to send you a change if you're okay with one of:
get_nprocs_conf
in place of current code (since it reads "possible")get_nprocs
as a fallback (since it's marked as safe)/proc/stat
similar to the get_nprocs_fallback
code (since get_nprocs_fallback
isn't an API)Let me clarify:
The logically "correct" value is sysconf(_SC_NPROCESSORS_CONF)
(or get_nprocs_conf
), but this is not malloc-free, so TCMalloc cannot use it. Our hand-rolled, allocation-free implementation is very similar, though, in that we read /sys
to figure out how many CPUs might be in existence. I don't think this addresses your sandbox's limitations.
sysconf(_SC_NPROCESSORS_ONLN)
/ get_nprocs
(which std::thread::hardware_concurrency
uses) is not the correct value for this. If cores are offlined, for example, rseq
might give us a CPU ID that exceeds the number of online CPUs, leading to memory corruption.
I'm not sure whether /proc/stat
provides online or all CPUs, but given the two distinct glibc implementations, I'm inclined to believe it's only the former.
I'm not sure if there is an alternative to reading /sys
that provides the number of possible CPU IDs that might be available.
We're going to see if we can get an exception in the compiler-explorer side, since it's not readily apparent whether the /proc fallback done by glibc provides a compatible alternative for /sys/devices/system/cpu/possible
.
We're trying to use tcmalloc in a sandboxed environment that hides /sys. This causes a crash in
NumPossibleCPUsNoCache
when trying to read"/sys/devices/system/cpu/possible"
.It looks like tcmalloc used to use absl's implementation, which I think was calling
std::thread::hardware_concurrency
(here). What do you think of adding that as a fallback when the /sys info is unavailable?For context, the specific crash is:
The specific environment is compiler-explorer, so you can look at what's available pretty easily, e.g. here's some relevant Python queries: https://python.compiler-explorer.com/z/jT6baYxba
std::thread::hardware_concurrency
returns 2 in this environment (https://cpp.compiler-explorer.com/z/r4dfbxxMs)The issue reported to us is https://github.com/carbon-language/carbon-lang/issues/4176. It's temporarily reproducible at https://carbon.compiler-explorer.com/z/Ecfjh7bhs, but we'll be trying to fix that promptly and then I think the Python check above is the better reference.