mzhaom / gperftools

Fast, multi-threaded malloc() and nifty performance analysis tools
https://code.google.com/p/gperftools/
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

crash in tcmalloc 1.0 #117

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Unfortunately I've got another unreproducible one.
As far as I know, we limited the memory with 'ulimit -v 220000'.
Sometimes this led to an assertion. Currently we run some tests with
tcmalloc v1.1.

Maybe you have an idea what I need to look for?

SIGNAL : 6 (SIGABRT)
SYSTEM : Linux i386
 CLASS : 32bit
  DATA : LSB

src/page_heap_allocator.h:66] assertion failed: free_area_ != NULL

#1  [0xffffe410]
#2  [0x6]
#3  [0x9c9289]   </lib/tls/libc.so.6: abort+233>
#4  [0xf6ca74b8] </.../libtcmalloc_minimal.so.0: TCMalloc_CRASH(bool, char
const*, int, char const*, ...)+168>
#5  [0xf6caaa01] </.../libtcmalloc_minimal.so.0: tcmalloc::NewSpan(unsigned
int, unsigned int)+193>
#6  [0xf6caa210] </.../libtcmalloc_minimal.so.0:
tcmalloc::PageHeap::Carve(tcmalloc::Span*, unsigned int)+80>
#7  [0xf6caa399] </.../libtcmalloc_minimal.so.0:
tcmalloc::PageHeap::AllocLarge(unsigned int)+185>
#8  [0xf6caa81a] </.../libtcmalloc_minimal.so.0:
tcmalloc::PageHeap::New(unsigned int)+138>
#9  [0xf6ca8b13] </.../libtcmalloc_minimal.so.0:
tcmalloc::CentralFreeList::Populate()+147>
#10 [0xf6ca8d58] </.../libtcmalloc_minimal.so.0:
tcmalloc::CentralFreeList::FetchFromSpansSafe()+56>
#11 [0xf6ca8dc9] </.../libtcmalloc_minimal.so.0:
tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+89>
#12 [0xf6caae05] </.../libtcmalloc_minimal.so.0:
tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, unsigned int)+85>
#13 [0xf6cb11e8] </.../libtcmalloc_minimal.so.0: malloc+840>
#14 [0xa4d7bb9]
#15 [0xf6c0acfb] </.../libstdc++.so.6: std::basic_string<char,
std::char_traits<char>, std::allocator<char> >::_Rep::_S_create(unsigned
int, unsigned int, std::allocator<char> const&)+107>
#16 [0xf6c0b8aa] </.../libstdc++.so.6: std::basic_string<char,
std::char_traits<char>, std::allocator<char>
>::_Rep::_M_clone(std::allocator<char> const&, unsigned int)+58>
#17 [0xf6c0c408] </.../libstdc++.so.6: std::basic_string<char,
std::char_traits<char>, std::allocator<char> >::reserve(unsigned int)+72>
#18 [0xf6c0c87f] </.../libstdc++.so.6: std::basic_string<char,
std::char_traits<char>, std::allocator<char>
>::append(std::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+79>
#19 ...

Original issue reported on code.google.com by mirko....@web.de on 30 Mar 2009 at 7:30

GoogleCodeExporter commented 9 years ago
It looks like there are several places still lurking in tcmalloc, where we 
crash when
we don't have memory for internal data structures, rather than just returning 
NULL
from malloc or realloc or whatever. :-(  I'll look to find them all and squash 
them
before the next release.

Original comment by csilv...@gmail.com on 30 Mar 2009 at 11:28

GoogleCodeExporter commented 9 years ago

Original comment by csilv...@gmail.com on 31 Mar 2009 at 7:13

GoogleCodeExporter commented 9 years ago
Looking into this, I'm trying to balance the complexity of the code being added 
to
handle these out-of-memory situations, with the benefit that we get from it.  
These
crashes that you're seeing is because we're out of memory, and fail to allocate 
some
internal data structures we need to satisfy a malloc call.  Even if we changed 
the
code to handle this case correctly (so we raise an out-of-memory exception, in 
this
case), what would the benefit be?  If you don't catch the exception, and the 
program
just dies anyway, maybe that's not much better, and all we should do is print a 
more
informative error message.  Or if you catch it and die anyway, same thing 
perhaps.

So, I guess my question is: what behavior are you hoping for here?  What would 
you do
if we fixed the code to raise an exception?

Original comment by csilv...@gmail.com on 3 Apr 2009 at 1:51

GoogleCodeExporter commented 9 years ago
Throwing an exception is always clean, probably an error can be logged by 
application and it can choose to gracefully exit. There are applications who 
use 
critical resources and crashing just would be an overkill.

Original comment by I.am.sum...@gmail.com on 3 Apr 2009 at 1:58

GoogleCodeExporter commented 9 years ago
Throwing an exception is clean from the user's point of view, but
implementation-wise, it's difficult because we may be in an internally 
inconsistent
state when the out-of-memory exception is thrown, and we'll need to clean all 
that up
on the way out.  It's much easier for us to just die.  That's why I want to
understand how *you're* behaving in the OOM situation, in *your* application.

I understand apps can do arbitrarily interesting things when they get an OOM
exception (or malloc returns NULL), but I don't know of any that actually does 
so. 
But that doesn't mean they don't exist.  What would your app, that triggered 
this
issue, do in that situation?

Original comment by csilv...@gmail.com on 3 Apr 2009 at 3:53

GoogleCodeExporter commented 9 years ago
An app does the right thing on OOM when the user code and all of the library 
code
does the right thing on OOM. That is much harder to guarantee. There is some 
code in
mysqld that does the right thing on OOM. The majority of it does not. I have no 
idea
how library code (SSL, zlib, libc) on which it depends behaves. 

I would rather have the server die on OOM than run on and possibly corrupt 
internal
state.

Original comment by mdcal...@gmail.com on 3 Apr 2009 at 4:03

GoogleCodeExporter commented 9 years ago
I think that's probably a pretty typical attitude.  I'm thinking if we make 
tcmalloc
malloc return NULL if we can't allocate the bytes the user asked for, but crash 
if we
can't allocate the bytes we need internally, it will probably serve the needs of
malloc users just fine.

Given that, I'm tempted to close this bug WillNotFix.  What I'd probably really 
do is
change from an assert to a more user-decipherable error message.  Would that 
fit your
needs?

Original comment by csilv...@gmail.com on 3 Apr 2009 at 5:18

GoogleCodeExporter commented 9 years ago
Usually we handle OOM and continue after cleanup. If tcmalloc can not guarantee 
to
work properly if it could not allocate internal data structures, then it does 
not
make sense to continue anyway.

I guess it's ok to have some error message telling that tcmalloc dies. But still
tcmalloc needs to return NULL if the allocation asked for by the user fails.

So to answer you question: Yes, that would fit my needs.

Original comment by mirko....@web.de on 15 Apr 2009 at 8:52

GoogleCodeExporter commented 9 years ago
OK, I've changed the code to better document failures involving internal data
structures.  As always, if we can't allocate because the user is asking too much
memory, we'll return NULL.  We'll only crash when the machine is *almost* out of
memory, and the user's request is small enough to fit in the remaining memory, 
but
the user's request + the overhead needed by tcmalloc to satisfy the request, 
isn't
small enough.  I expect that to be quite rare in normal usage.

Original comment by csilv...@gmail.com on 18 Apr 2009 at 12:13

GoogleCodeExporter commented 9 years ago
I'm a tcmalloc user and I highly relay on its ability to return NULL instead of
crashing. I hope that as a library aiming at rock-solid industry standards, this
property will not be loosen. IMO it should be strengthen to be a total 
guarantee,
that tcmalloc will never crash.

Original comment by gall.c...@gmail.com on 2 Feb 2010 at 2:44

GoogleCodeExporter commented 9 years ago
PS. Why not preallocate 500 bytes of memory right after tcmalloc start, just to 
have
that memory around in case of described situation, when the assert is 
activated? This
way tcmalloc would be able to guarantee, that it will always return NULL, and 
never
crash.

Original comment by gall.c...@gmail.com on 3 Feb 2010 at 5:13

GoogleCodeExporter commented 9 years ago
Have you actually run into this problem in practice, or are you just concerned 
about
it in theory?  The situations where it triggers are very specific, and it's 
hard for
me to imagine a program working well in that situation in any case.  If you're 
that
close to being out of memory, you can't depend on anything to really work.  
(Note
that just making a really big allocation isn't enough to cause tcmalloc to die.)

There's a real cost, in terms of complexity and performance, in making tcmalloc
behave better in these very unlikely scenarios, and I'd want to understand how 
and
why these scenarios arise before deciding how to handle them.

} PS. Why not preallocate 500 bytes of memory right after tcmalloc start, just 
to
} have that memory around in case of described situation, when the assert is 
activated?

That would work the first time the situation arose, but if you did another 
malloc
after that, you'd be back in the same trouble again.

Original comment by csilv...@google.com on 3 Feb 2010 at 9:33

GoogleCodeExporter commented 9 years ago
I run into such situation. I'm pre-allocating a block of memory myself in order 
to
free it after first failed allocation reported by malloc(), and then immediately
shutdown the program smoothly. If tcmalloc did the same thing, the shutdown 
mechanism
would be guaranteed to be stable. Now, there is a randomness in it.

Original comment by gall.c...@gmail.com on 17 Feb 2010 at 4:41

GoogleCodeExporter commented 9 years ago
Have you ever seen tcmalloc crash because of an out of memory situation?  This 
is the
kind of problem that is, I believe, more theoretical than actual.  If you are 
having
specific problems caused by this behavior, let's look at the specifics of it, 
and
figure out the best way forward.

Original comment by csilv...@gmail.com on 17 Feb 2010 at 8:02