Open ipapapa opened 7 years ago
If any process killed by the os, u can find crash log info in syslog.
this seems an OOM excetion, any log about the memory usage during the test?
This is what I see in dmesg
which indicates a segfault.
[115645.998696] traps: rocksdb:bg5[13107] general protection ip:715db0 sp:7f0ef07fcc30 error:0 in ardb-server[400000+415000]
[479008.662810] ardb-server[11000]: segfault at 0 ip 000000000072dc97 sp 00007fb1bcdfddb0 error 4 in ardb-server[400000+415000]
[700896.165801] rocksdb:bg4[11261]: segfault at 7f78a5000780 ip 00000000005748aa sp 00007f78a75fdee0 error 6 in ardb-server[400000+415000]
and here what I see for the second error
addr2line -e /apps/ardb/bin/ardb-server 00000000005748aa
/src/src/db/rocksdb/rocksdb_engine.cpp:222
The error seems though to be consistent for the above redis-benchmark
so it is reproducible. You just need to run the test a little long.
there is a fixed issue log crash fix
https://github.com/yinqiwen/ardb/commit/98c9973bb35eedd61234ad645080407aef15761e , is this the same problem?
We still have this issue while testing. Typically teh data size reaches 28G when this occurs.
[45563.075717] show_signal_msg: 9 callbacks suppressed [45563.075730] ardb-server[13099]: segfault at 7f0003ca2a00 ip 00007f5c05069404 sp 00007f5bea5fdda0 error 4 in libpthread-2.19.so[7f5c0505f000+19000] [80006.812341] rocksdb:bg4[4377]: segfault at 7f006e21d748 ip 00000000005a9ec3 sp 00007f246c9fdc00 error 4 in ardb-server[400000+415000]
I do not have a core dump apparently.
I'm using ardb with rocksdb. Currently my DB size is 37GB, ~78M keys. No crashes so far. Here is my image: https://hub.docker.com/r/expert/ardb/
@unoexperto That's a very useful comment! Thank you.
@unoexperto Did you run the redis-benchmark
test that I mentioned above and got 37GB of data size? or are you using a different load?
@ipapapa Nope, I didn't use it. I simply copied my production database from SSDB to see what happens. I guess pattern of load is different in my case from what redis-benchmark gives.
@yinqiwen Pulling the latest version and managed to collect the core dump. I am seeing the following the segmentation fault using the redis-benchmark above. It runs for a couple of hours and the ARDB gets terminated with SIGSERV. The size of the data seems to be the same as my tests above
du -sh rocksdb/
38G rocksdb/
gdb:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 atomic_get_and_set_vptr_t (v=0x7f7462969140, var=0x58) at ./common/util/concurrent_queue.hpp:77
Debugging:
(gdb) bt
#0 atomic_get_and_set_vptr_t (v=0x7f7462969140, var=0x58) at ./common/util/concurrent_queue.hpp:77
#1 ardb::MPSCQueue<ardb::ChannelAsyncIOContext>::Push (value=..., this=0x58) at ./common/util/concurrent_queue.hpp:271
#2 ardb::ChannelService::AsyncIO (this=0x0, ctx=...) at common/channel/channel_service.cpp:563
#3 0x00000000004817f9 in ardb::ChannelService::AsyncIO (this=<optimized out>, id=id@entry=0,
cb=cb@entry=0x5684e0 <ardb::ReplicationBacklog::WriteWALCallback(ardb::Channel*, void*)>, data=data@entry=0x7f7469818540) at common/channel/channel_service.cpp:663
#4 0x0000000000567c2c in ardb::ReplicationBacklog::WriteWAL (this=0xa7e7d8 <ardb::_repl_singleton+1240>, ns=..., cmd=...) at repl/repl.cpp:253
#5 0x000000000054093a in ardb::Ardb::DoCall (this=this@entry=0x7ffc77bf20b0, ctx=..., setting=..., args=...) at db/db.cpp:1412
#6 0x0000000000546037 in ardb::Ardb::Call (this=0x7ffc77bf20b0, ctx=..., args=...) at db/db.cpp:1557
#7 0x00000000004796bd in ardb::RedisRequestHandler::MessageReceived (e=..., ctx=..., this=0x7f7484015c00) at network.cpp:130
#8 ardb::ChannelUpstreamHandler<ardb::codec::RedisCommandFrame>::HandleStreamEvent (e=..., ctx=..., this=0x7f7484015c00)
at ./common/channel/channel_upstream_handler.hpp:133
#9 ardb::ChannelPipeline::SendUpstream<ardb::codec::RedisCommandFrame> (ctx=<optimized out>, e=..., this=<optimized out>) at ./common/channel/all_includes.hpp:89
#10 0x000000000047e420 in ardb::ChannelHandlerContext::SendUpstream<ardb::MessageEvent<ardb::codec::RedisCommandFrame> > (this=0x7f74840e4e10, this=0x7f74840e4e10,
e=...) at ./common/channel/all_includes.hpp:164
#11 ardb::fire_message_received<ardb::codec::RedisCommandFrame> (destructor=0x0, message=0x7f746a5fea00, ctx=...) at ./common/channel/channel_helper.hpp:91
#12 ardb::codec::StackFrameDecoder<ardb::codec::RedisCommandFrame>::CallDecode (this=0x7f74840e9900, context=..., channel=0x7f746b5ed800, cumulation=...)
at ./common/channel/codec/stack_frame_decoder.hpp:102
#13 0x000000000047ef68 in ardb::codec::StackFrameDecoder<ardb::codec::RedisCommandFrame>::MessageReceived (this=0x7f74840e9900, ctx=..., e=...)
at ./common/channel/codec/stack_frame_decoder.hpp:157
#14 0x000000000048a8a3 in ardb::ChannelUpstreamHandler<ardb::Buffer>::HandleStreamEvent (e=..., ctx=..., this=<optimized out>)
at ./common/channel/channel_upstream_handler.hpp:133
#15 ardb::ChannelPipeline::SendUpstream<ardb::Buffer> (ctx=<optimized out>, e=..., this=0x7f746b5ed838) at ./common/channel/all_includes.hpp:89
#16 0x000000000048bb62 in ardb::ChannelPipeline::SendUpstream<ardb::MessageEvent<ardb::Buffer> > (event=..., this=0x7f746b5ed838)
at ./common/channel/all_includes.hpp:128
#17 ardb::fire_message_received<ardb::Buffer> (destructor=0x0, message=0x7f746b5ed868, channel=0x7f746b5ed800) at ./common/channel/channel_helper.hpp:83
#18 ardb::Channel::OnRead (this=0x7f746b5ed800) at common/channel/channel.cpp:560
#19 0x000000000048a90d in ardb::Channel::IOEventCallback (eventLoop=<optimized out>, fd=<optimized out>, clientData=0x7f746b5ed800, mask=1)
at common/channel/channel.cpp:53
#20 0x00000000004c1377 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f74840e5bd8, flags=flags@entry=3) at common/channel/redis/ae.c:429
#21 0x00000000004c17cb in aeMain (eventLoop=0x7f74840e5bd8) at common/channel/redis/ae.c:485
#22 0x00000000004af3f0 in ardb::Thread::ThreadFunc (data=0x7f748413a290) at common/thread/thread.cpp:38
#23 0x00007f74857946ba in start_thread (arg=0x7f746a5ff700) at pthread_create.c:333
#24 0x00007f74845f782d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
In line 1, this
being a second argument sounds suspicious.
Is there a shutdown
command received by ardb?
in the stack trace, ChannelService is null in ReplicationService, which would only be free after whole server shutdown.
I do not believe redis-benchmark
has a shutdown
integrated. Hypothetically speaking, if there was a shutdown
, ardb would not have been terminated by SIGSEGV.
Btw at a later point of time, I also got some rocksdb core dumps core_rocksdb:bg13.32454.1496361099
. Is ardb generating different segmentation faults? I used the ardb-server
as the executable to open it in gdb.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000577e9a in ardb::RocksDBLogger::Logv (this=<optimized out>, log_level=<optimized out>, format=0x7cda12 "%s", ap=0x7fa0eb9feec0)
at db/rocksdb/rocksdb_engine.cpp:222
222 db/rocksdb/rocksdb_engine.cpp: No such file or directory.
[Current thread is 1 (Thread 0x7fa0eba00700 (LWP 6126))]
(gdb) bt
#0 0x0000000000577e9a in ardb::RocksDBLogger::Logv (this=<optimized out>, log_level=<optimized out>, format=0x7cda12 "%s", ap=0x7fa0eb9feec0)
at db/rocksdb/rocksdb_engine.cpp:222
#1 0x0000000000655618 in rocksdb::Log (log_level=log_level@entry=rocksdb::WARN_LEVEL, info_log=std::shared_ptr (count 3, weak 0) 0x7fa0f4947f40,
format=format@entry=0x7cda12 "%s") at util/env.cc:199
#2 0x0000000000583522 in rocksdb::DBImpl::MaybeDumpStats (this=this@entry=0x7fa0eee77000) at db/db_impl.cc:653
#3 0x00000000005a86e6 in rocksdb::DBImpl::BackgroundCallCompaction (this=0x7fa0eee77000, arg=0x0) at db/db_impl.cc:3331
#4 0x000000000069827d in std::function<void ()>::operator()() const (this=0x7fa0eb9ffbe0) at /usr/include/c++/5/functional:2267
#5 rocksdb::ThreadPoolImpl::Impl::BGThread (this=this@entry=0x7fa0f49c50e0, thread_id=thread_id@entry=12) at util/threadpool_imp.cc:235
#6 0x000000000069846d in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper (arg=0x7fa0f4844140) at util/threadpool_imp.cc:259
#7 0x00007fa0f5709c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007fa0f600c6ba in start_thread (arg=0x7fa0eba00700) at pthread_create.c:333
#9 0x00007fa0f4e6f82d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Is it related to https://github.com/yinqiwen/ardb/issues/316?
Is the last stack trace generated by the latest code?
Yes, I am using the latest code prior to to the latest commit for the log crash. I am concerned that your fix is not related to this crash. Why would Buffer be NULL after it was created a couple of lines above?
I also see now a different core dump after I restart ARDB from the previous crash (aka with hot data) and use redis-benchmark
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000006eada2 in std::default_delete<rocksdb::SliceTransform>::operator() (this=<optimized out>, __ptr=0x7fc08d004040) at /usr/include/c++/5/bits/unique_ptr.h:76
76 /usr/include/c++/5/bits/unique_ptr.h: No such file or directory.
[Current thread is 1 (Thread 0x7fc088dff700 (LWP 410))]
(gdb) bt
#0 0x00000000006eada2 in std::default_delete<rocksdb::SliceTransform>::operator() (this=<optimized out>, __ptr=0x7fc08d004040) at /usr/include/c++/5/bits/unique_ptr.h:76
#1 std::unique_ptr<rocksdb::SliceTransform, std::default_delete<rocksdb::SliceTransform> >::~unique_ptr (this=0x7fc08d645378, __in_chrg=<optimized out>) at /usr/include/c++/5/bits/unique_ptr.h:236
#2 rocksdb::BlockBasedTable::Rep::~Rep (this=0x7fc08d645200, __in_chrg=<optimized out>) at table/block_based_table_reader.cc:423
#3 rocksdb::BlockBasedTable::~BlockBasedTable (this=<optimized out>, __in_chrg=<optimized out>) at table/block_based_table_reader.cc:509
#4 0x00000000006eaff1 in rocksdb::BlockBasedTable::~BlockBasedTable (this=0x7fc08d61e320, __in_chrg=<optimized out>) at table/block_based_table_reader.cc:510
#5 0x000000000066e71b in rocksdb::LRUHandle::Free (this=0x7fc08d672f50) at ./util/lru_cache.h:108
#6 rocksdb::LRUCacheShard::Erase (this=0x7fc08eb76b30, key=..., hash=3052707373) at util/lru_cache.cc:411
#7 0x000000000060b648 in rocksdb::TableCache::Evict (cache=<optimized out>, file_number=23942) at db/table_cache.cc:406
#8 0x000000000059d939 in rocksdb::DBImpl::PurgeObsoleteFiles (this=this@entry=0x7fc08907f000, state=..., schedule_only=schedule_only@entry=false) at db/db_impl.cc:1113
#9 0x00000000005a8a35 in rocksdb::DBImpl::BackgroundCallCompaction (this=0x7fc08907f000, arg=<optimized out>) at db/db_impl.cc:3386
#10 0x000000000069827d in std::function<void ()>::operator()() const (this=0x7fc088dfebe0) at /usr/include/c++/5/functional:2267
#11 rocksdb::ThreadPoolImpl::Impl::BGThread (this=this@entry=0x7fc08ebc50e0, thread_id=thread_id@entry=8) at util/threadpool_imp.cc:235
#12 0x000000000069846d in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper (arg=0x7fc08ea47050) at util/threadpool_imp.cc:259
#13 0x00007fc08f8d8c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007fc0901db6ba in start_thread (arg=0x7fc088dff700) at pthread_create.c:333
#15 0x00007fc08f03e82d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Again to debug the above it should be fairly simple. Let the following run until you got 40GB
redis-benchmark -p 22122 -c 10 -d 1024 -n 500000000 -t set -r 500000000
I am trying to stress test ARDB. It seems after using redis-benchmark heavily ARDB dies
However, I cannot find out the core dump to debug. I am assuming there is some leak. Any advice would be highly appreciated. The log file are not informative about an
ERROR
I am using
/tmp
for the code dumps but the folder does not have any core dumps.