Open GoogleCodeExporter opened 9 years ago
I think part of the reason might be due to disabling of libstdc++ fancy
"cached" allocator. And you seem to be heavy STL user.
tcmalloc has this code (in malloc_extension.cc):
#ifdef __GLIBC__
// GNU libc++ versions 3.3 and 3.4 obey the environment variables
// GLIBCPP_FORCE_NEW and GLIBCXX_FORCE_NEW respectively. Setting
// one of these variables forces the STL default allocator to call
// new() or delete() for each allocation or deletion. Otherwise
// the STL allocator tries to avoid the high cost of doing
// allocations by pooling memory internally. However, tcmalloc
// does allocations really fast, especially for the types of small
// items one sees in STL, so it's better off just using us.
// TODO: control whether we do this via an environment variable?
setenv("GLIBCPP_FORCE_NEW", "1", false /* no overwrite*/);
setenv("GLIBCXX_FORCE_NEW", "1", false /* no overwrite*/);
// Now we need to make the setenv 'stick', which it may not do since
// the env is flakey before main() is called. But luckily stl only
// looks at this env var the first time it tries to do an alloc, and
// caches what it finds. So we just cause an stl alloc here.
string dummy("I need to be allocated");
dummy += "!"; // so the definition of dummy isn't optimized out
#endif /* __GLIBC__ */
So you might want to try the following experiments:
* try setting GLIBCXX_FORCE_NEW=1 when running under glibc malloc and see if
makes your program slower
* try removing this code from tcmalloc and see if it "improves" your performance
In addition to that we should do something to figure out how to make tcmalloc
faster for your use case. Playing with TCMALLOC_TRANSFER_NUM_OBJ or
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES might be helpful.
Original comment by alkondratenko
on 4 Dec 2014 at 4:00
Let me clarify. I don't think there's anything particularly broken with lock
implementation (it's far from perfect implementation, but it should be ok).
I think your problem is that some lock(s) are taken too frequently and/or held
for too long.
And part of investigation of this case is figuring out why your "stock" malloc
is faster. I do suspect it's due to libstdc++ fancy allocators (note that newer
gcc versions do not enable them by default)
Original comment by alkondratenko
on 5 Dec 2014 at 8:28
Original issue reported on code.google.com by
zlxde...@gmail.com
on 4 Dec 2014 at 8:54