Open danielmewes opened 10 years ago
Would you avoid shipping #2130 until this is done?
@coffeemug Not for the performance aspect of it, but probably for the running out of memory aspect of it.
We could do something simple at first, such as internally setting the allocator to interleaved mode on startup. We would still have to link with libnuma on Linux. @atnnn: How difficult is it to add libnuma as a dependency?
Actually, looking at the code again, it might be possible to maintain the local allocation even with the #2130 changes without any additional libraries. We should do that, preferably.
Yeah, that would be ideal.
Adding libnuma as a dependency would be easy. Versions of libnuma on the distributions we support range from 2.0.3 to 2.0.7. I don't think OS X has any support for it.
There are multiple serializer threads, aren't there? Also, the problem already exists, should the balancer try to give most of the cache's memory to tables that reside on a certain node. We also already have this problem with read-ahead buffers. Addressing this question is not limited to deciding which thread buffers should be allocated on.
And if we do want to worry about memory usage, the first thing to worry about is whether we should switch to jemalloc. Based on internet posts that might be out of date, it seems like we should.
There's only one serializer thread per table, but 8 cache threads. It is unlikely that just one of the caches of a single table gets most of the memory assigned by the cache balancer (unless DoS or a highly uneven key access distribution).
Generally, allocating memory on the cache's thread is good because that's where it is accessed from most often. So it will be faster than simply switching the allocation strategy to "interleaved" (where memory will be evenly allocated from all NUMA nodes round robin or something similar). The advantage of using the interleaved strategy is that we won't have problems with corner cases such as unevenly distributed cache sizes.
@srh: Is jemalloc supposed to use less memory in general? Or does it actually do something about the problem of a single NUMA node running out of RAM while others have plenty free?
You don't need one of the caches of a single table, you can just have all 8 of the caches get memory assigned by the cache balancer. They're probably all going to be on the same CPU.
@srh: Is jemalloc supposed to use less memory in general?
Yes, that's the difference. That info could be out of date. jemalloc's downside, relative to tcmalloc, is supposedly the performance you see when spawning threads on the fly, which we do not do.
I opened an issue about considering jemalloc https://github.com/rethinkdb/rethinkdb/issues/2279.
Usually the 8 caches will (almost) all be on different CPUs. Otherwise there wouldn't be a point in having them in the first place.
Otherwise there wouldn't be a point in having them in the first place.
CPUs != cores.
Oh I see. Yeah you are right, that might already be a problem.
We should use libnuma [1] on Linux to allocate pages on the NUMA node of the cache that is going to use them. That will improve both performance and avoid problems like running out of usable RAM too early (see [2]).
Note that we are already allocating buffers on the right thread in 1.12. However #2130 makes it difficult to maintain this property.
A work-around for allocation problems is to launch rethinkdb with
numactl --interleave all
. However that won't help with the (potential) performance regression, and is annoying.[1] http://linux.die.net/man/3/numa [2] http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/