ideawu / ssdb

SSDB - A fast NoSQL database, an alternative to Redis
http://ssdb.io/
BSD 3-Clause "New" or "Revised" License
8.19k stars 1.4k forks source link

server hangs starting large database while memory fills up until OOM #1337

Closed ericgreene closed 4 years ago

ericgreene commented 4 years ago

When starting up a large (2.4TB) database, ssdb reaches the below logs and will not finish starting, while memory continues to increase until it reaches max ram, at which point OOM kills the process. I cannot connect to the database at all. The server has 30 GB RAM, and I have played with cache_size and max_open_files settings with little success.

Ubuntu 14.04 30 GB RAM 8 core

2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(53): meta_db : /var/lib/ssdb/meta 2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(54): cache_size : 2000 MB 2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(55): block_size : 32 KB 2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(56): write_buffer : 64 MB 2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(57): max_open_files : 1000 2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(58): compaction_speed : 1000 MB/s 2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(59): compression : yes 2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(60): binlog : no 2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(61): binlog_capacity : 20000000 2020-03-18 21:47:51.187 [INFO ] ssdb-server.cpp(62): sync_speed : 0 MB/s

ideawu commented 4 years ago

Hi, try using ssdb-repair to repair the database. backup /var/lib/ssdb before you doing the repair, then

ssdb-repair /var/lib/ssdb/data
ericgreene commented 4 years ago

Hi, Thanks I tried that. All entries said ok but it still won't start without being killed by OOM.

final log lines show:

2020/03/18-04:39:40.404657 7f13f08057c0 Archiving /data/MANIFEST-143552617: OK
2020/03/18-04:39:40.407168 7f13f08057c0 **** Repaired leveldb /data; recovered 83409 files; 2510581719874 bytes. Some data may have been lost. ****
ideawu commented 4 years ago

are you using the code of master branch? if not, please build ssdb-repair from latest code of master branch. then please paste ssdb-repair's outputs here.

ericgreene commented 4 years ago

this time received errors:

2020/03/21-02:59:21.401062 7f3da8bb67c0 Archiving /data/MANIFEST-143553622: NotFound: /data/MANIFEST-143553622: No such file or directory
2020/03/21-02:59:21.403902 7f3da8bb67c0 **** Repaired leveldb /data; recovered 83409 files; 2510581719874 bytes. Some data may have been lost. ****
ideawu commented 4 years ago

Do pasted what ssdb-repair print here, it would be like

ssdb-repair - SSDB repair tool
Copyright (c) 2013-2015 ssdb.io

writing repair log into: repair.log
leveldb repaired.
compacting data...
                               Compactions
Level  Files Size(MB) Time(sec) Read(MB) Write(MB)
--------------------------------------------------
  0        0        0         0        0         0
  1        1        1         0        1         1
ericgreene commented 4 years ago

The compacting data step has been running since Friday. It does not seem stuck but I don't know how to verify that.

[Wed Mar 25][22:42:09]
-> ./ssdb-repair /var/lib/ssdb/data
ssdb-repair - SSDB repair tool
Copyright (c) 2013-2015 ssdb.io

writing repair log into: repair.log
leveldb repaired.
compacting data...
ericgreene commented 4 years ago

It took a while (finished April 7th so 2 weeks or so) but finally finished and now it was able to start. Thanks!

(MB) Time(sec) Read(MB) Write(MB)
--------------------------------------------------
  0        0        0         0        0         0
  1  1243570  2505148   1040837  2394277   2505152
  2        2        4         0        0         0