yinqiwen / ardb

A redis protocol compatible nosql, it support multiple storage engines as backend like Google's LevelDB, Facebook's RocksDB, OpenLDAP's LMDB, PerconaFT, WiredTiger, ForestDB.
BSD 3-Clause "New" or "Revised" License
1.83k stars 278 forks source link

ARDB stops responding to commands. #450

Closed solanoepalacio closed 6 years ago

solanoepalacio commented 6 years ago

Hey there,

I'm having a problem with ARDB, and I was wondering if you could help.

Let me explain what is going on: I'm using ARDB as a queue to process binary data transfers/ingest. There are two services connected to the ARDB instance (~32 connections each). One of the services receives binary data through http and pushes it to the db (uses lists, hashes and regular-keys data types) and the other service uploads the data to S3 (uses lists, hashes and sorted sets data-types to implement reliable-queues).

Eventually (not sure depending on what) the db stops responding to all clients (even to redis-cli). No errors are logged, it just stops responding. Both services work fine when using Redis.

Here are the statistics dump logs of a fresh install of ardb using leveldb as storage engine, from the moment I start the services untill it stops responding: https://gist.github.com/solanoepalacio/b8d8dfb3b64be2655ccfc2e65979035d

The services are written in Node.js and I'm using the 'redis' NPM library.

Might be worth to note that I get some warnings during ardb install. I've tried with forestdb, leveldb and rocksdb storage engines. I've created this gist, which contains the the install logs for the three storage engines in case they are usefull. The warnings are close to the end in each file. https://gist.github.com/solanoepalacio/de95b96966d388ac812ff85773e26da2

I've already written the two services trusting ardb for persistence, hopefully I can sort this out.

Thanks a ton in advance !!

yinqiwen commented 6 years ago

can u use gdb to attach the process to get all thread stack trace dump when the server is stop responding?

solanoepalacio commented 6 years ago

Hey yinqiwen, thanks a lot for responding! I'll get the trace dump and post it later today.

solanoepalacio commented 6 years ago

Hey yinqwen.

Here you'll find the logs. It's a fresh install of ardb with rocks-db as the storage engine: https://gist.github.com/solanoepalacio/11ea0847d18542b5f1376278bf664ad6

I have very short experience on c++ and gdb, so I hope these are the logs you meant. What I've done is

$ gdb ./src/ardb-server |& tee gdb.log
(gdb) bt
(gdb) run

And then started the node processes.

If there's anything else I can do to help you help me, please let me know!

thanks again, regards.

yinqiwen commented 6 years ago

@solanoepalacio this seems the server locked a long time by rocksdb/leveldb. u can do the debug by the following steps:

  1. start the server and wait uniti the ardb-serve process stop responding commands from any clients.
  2. use gdb <ardb-server path> <pid> to attach the running process
  3. use command thread apply all bt in gdb shell to get all thread stack trace dump.
solanoepalacio commented 6 years ago

thanks a lot for explaining how to get the stack-trace. It was actually quite usefull to understand :).

Here are the logs you asked for:

https://gist.github.com/solanoepalacio/556b273c867c98fa5e2dc6066c01fa08

yinqiwen commented 6 years ago

@solanoepalacio a lock bug on blpop/brpop, will fixed later

yinqiwen commented 6 years ago

@solanoepalacio actually, this issue can be reproduced by command BRPOPLPUSH. this is fixed by latest commit

solanoepalacio commented 6 years ago

Great @yinqiwen !! I'll try it out later.

Thanks a lot!!

solanoepalacio commented 6 years ago

Everything is working perfectly now @yinqiwen You rock !