lealone / Lealone

比 MySQL 和 MongoDB 快10倍的 OLTP 关系数据库和文档数据库
Other
2.44k stars 514 forks source link

数据库运行一段时间出现无法连接的情况 #215

Closed ContyChen closed 7 months ago

ContyChen commented 8 months ago

版本:5.2.1

数据库运行一段时间后无法连接,内存占用巨大 ERROR [ScheduleService-15] [2023-12-28T00:53:28,003] Failed to create session, sessionId: 5 org.lealone.common.exceptions.JdbcSQLException: Database "wms_cloud" not found [90013-1] at org.lealone.common.exceptions.DbException.getJdbcSQLException(DbException.java:342) ~[lealone-5.2.1.jar:?] at org.lealone.common.exceptions.DbException.get(DbException.java:173) ~[lealone-5.2.1.jar:?] at org.lealone.common.exceptions.DbException.get(DbException.java:150) ~[lealone-5.2.1.jar:?] at org.lealone.db.LealoneDatabase.getDatabase(LealoneDatabase.java:144) ~[lealone-5.2.1.jar:?] at org.lealone.db.session.ServerSessionFactory.createServerSession(ServerSessionFactory.java:49) ~[lealone-5.2.1.jar:?] at org.lealone.db.session.ServerSessionFactory.createSession(ServerSessionFactory.java:40) ~[lealone-5.2.1.jar:?] at org.lealone.db.ConnectionInfo.createSession(ConnectionInfo.java:652) ~[lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.createSession(TcpServerConnection.java:104) [lealone-5.2.1.jar:?] at org.lealone.server.SessionInitTask.run(SessionInitTask.java:26) [lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.readInitPacket(TcpServerConnection.java:96) [lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.handleRequest(TcpServerConnection.java:69) [lealone-5.2.1.jar:?] at org.lealone.net.TransferConnection.handle(TransferConnection.java:127) [lealone-5.2.1.jar:?] at org.lealone.net.nio.NioEventLoop.read(NioEventLoop.java:230) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.handleSelectedKeys(Scheduler.java:422) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.runEventLoop(Scheduler.java:408) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.run(Scheduler.java:111) [lealone-5.2.1.jar:?] ERROR [ScheduleService-16] [2023-12-28T00:53:31,396] Failed to create session, sessionId: 3 org.lealone.common.exceptions.JdbcSQLException: Database "wms_cloud" not found [90013-1] at org.lealone.common.exceptions.DbException.getJdbcSQLException(DbException.java:342) ~[lealone-5.2.1.jar:?] at org.lealone.common.exceptions.DbException.get(DbException.java:173) ~[lealone-5.2.1.jar:?] at org.lealone.common.exceptions.DbException.get(DbException.java:150) ~[lealone-5.2.1.jar:?] at org.lealone.db.LealoneDatabase.getDatabase(LealoneDatabase.java:144) ~[lealone-5.2.1.jar:?] at org.lealone.db.session.ServerSessionFactory.createServerSession(ServerSessionFactory.java:49) ~[lealone-5.2.1.jar:?] at org.lealone.db.session.ServerSessionFactory.createSession(ServerSessionFactory.java:40) ~[lealone-5.2.1.jar:?] at org.lealone.db.ConnectionInfo.createSession(ConnectionInfo.java:652) ~[lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.createSession(TcpServerConnection.java:104) [lealone-5.2.1.jar:?] at org.lealone.server.SessionInitTask.run(SessionInitTask.java:26) [lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.readInitPacket(TcpServerConnection.java:96) [lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.handleRequest(TcpServerConnection.java:69) [lealone-5.2.1.jar:?] at org.lealone.net.TransferConnection.handle(TransferConnection.java:127) [lealone-5.2.1.jar:?] at org.lealone.net.nio.NioEventLoop.read(NioEventLoop.java:230) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.handleSelectedKeys(Scheduler.java:422) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.runEventLoop(Scheduler.java:408) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.run(Scheduler.java:111) [lealone-5.2.1.jar:?]

ContyChen commented 8 months ago

db_0.trace.db有错误信息

2023-12-27 21:18:04 database: CREATE DATABASE IF NOT EXISTS WMS_CLOUD RUN MODE CLIENT_SERVER PARAMETERS(DATABASE_TO_UPPER='FALSE')
org.lealone.common.exceptions.JdbcSQLException: Out of memory.; SQL statement:
CREATE CACHED TABLE PUBLIC."sys_oper_log" COMMENT STRINGDECODE('\u64cd\u4f5c\u65e5\u5fd7\u8bb0\u5f55')(
    "oper_id" BIGINT NOT NULL SELECTIVITY 100 COMMENT STRINGDECODE('\u65e5\u5fd7\u4e3b\u952e'),
    "title" VARCHAR(50) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u6a21\u5757\u6807\u9898'),
    "business_type" INT DEFAULT 0 SELECTIVITY 1 COMMENT STRINGDECODE('\u4e1a\u52a1\u7c7b\u578b\uff080\u5176\u5b83 1\u65b0\u589e 2\u4fee\u6539 3\u5220\u9664\uff09'),
    "method" VARCHAR(100) DEFAULT '' SELECTIVITY 2 COMMENT STRINGDECODE('\u65b9\u6cd5\u540d\u79f0'),
    "request_method" VARCHAR(10) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u8bf7\u6c42\u65b9\u5f0f'),
    "operator_type" INT DEFAULT 0 SELECTIVITY 1 COMMENT STRINGDECODE('\u64cd\u4f5c\u7c7b\u522b\uff080\u5176\u5b83 1\u540e\u53f0\u7528\u6237 2\u624b\u673a\u7aef\u7528\u6237\uff09'),
    "oper_name" VARCHAR(50) DEFAULT '' SELECTIVITY 5 COMMENT STRINGDECODE('\u64cd\u4f5c\u4eba\u5458'),
    "dept_name" VARCHAR(50) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u90e8\u95e8\u540d\u79f0'),
    "oper_url" VARCHAR(255) DEFAULT '' SELECTIVITY 2 COMMENT STRINGDECODE('\u8bf7\u6c42URL'),
    "oper_ip" VARCHAR(128) DEFAULT '' SELECTIVITY 4 COMMENT STRINGDECODE('\u4e3b\u673a\u5730\u5740'),
    "oper_location" VARCHAR(255) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u64cd\u4f5c\u5730\u70b9'),
    "oper_param" VARCHAR(2000) DEFAULT '' SELECTIVITY 95 COMMENT STRINGDECODE('\u8bf7\u6c42\u53c2\u6570'),
    "json_result" VARCHAR(2000) DEFAULT '' SELECTIVITY 65 COMMENT STRINGDECODE('\u8fd4\u56de\u53c2\u6570'),
    "status" INT DEFAULT 0 SELECTIVITY 1 COMMENT STRINGDECODE('\u64cd\u4f5c\u72b6\u6001\uff080\u6b63\u5e38 1\u5f02\u5e38\uff09'),
    "error_msg" VARCHAR(2000) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u9519\u8bef\u6d88\u606f'),
    "oper_time" DATETIME SELECTIVITY 100 COMMENT STRINGDECODE('\u64cd\u4f5c\u65f6\u95f4')
) [90108-1]
    at org.lealone.common.exceptions.DbException.getJdbcSQLException(DbException.java:342)
    at org.lealone.common.exceptions.DbException.get(DbException.java:162)
    at org.lealone.common.exceptions.DbException.convert(DbException.java:287)
    at org.lealone.sql.executor.YieldableBase.handleException(YieldableBase.java:171)
    at org.lealone.sql.executor.YieldableBase.run(YieldableBase.java:124)
    at org.lealone.sql.StatementBase.syncExecute(StatementBase.java:507)
    at org.lealone.sql.StatementBase.executeUpdate(StatementBase.java:536)
    at org.lealone.db.MetaRecord.execute(MetaRecord.java:69)
    at org.lealone.db.MetaRecord.execute(MetaRecord.java:60)
    at org.lealone.db.Database.openMetaTable(Database.java:535)
    at org.lealone.db.Database.openDatabase(Database.java:475)
    at org.lealone.db.Database.init(Database.java:414)
    at org.lealone.sql.ddl.CreateDatabase.update(CreateDatabase.java:62)
    at org.lealone.sql.executor.YieldableLocalUpdate.executeInternal(YieldableLocalUpdate.java:23)
    at org.lealone.sql.executor.YieldableBase.run(YieldableBase.java:115)
    at org.lealone.sql.StatementBase.syncExecute(StatementBase.java:507)
    at org.lealone.sql.StatementBase.executeUpdate(StatementBase.java:536)
    at org.lealone.db.MetaRecord.execute(MetaRecord.java:69)
    at org.lealone.db.LealoneDatabase.getDatabase(LealoneDatabase.java:137)
    at org.lealone.db.session.ServerSessionFactory.createServerSession(ServerSessionFactory.java:49)
    at org.lealone.db.session.ServerSessionFactory.createSession(ServerSessionFactory.java:40)
    at org.lealone.db.ConnectionInfo.createSession(ConnectionInfo.java:652)
    at org.lealone.server.TcpServerConnection.createSession(TcpServerConnection.java:104)
    at org.lealone.server.SessionInitTask.run(SessionInitTask.java:26)
    at org.lealone.server.TcpServerConnection.readInitPacket(TcpServerConnection.java:96)
    at org.lealone.server.TcpServerConnection.handleRequest(TcpServerConnection.java:69)
    at org.lealone.net.TransferConnection.handle(TransferConnection.java:127)
    at org.lealone.net.nio.NioEventLoop.read(NioEventLoop.java:230)
    at org.lealone.server.Scheduler.handleSelectedKeys(Scheduler.java:422)
    at org.lealone.server.Scheduler.runEventLoop(Scheduler.java:408)
    at org.lealone.server.Scheduler.run(Scheduler.java:111)
Caused by: java.lang.OutOfMemoryError: Cannot reserve 307836 bytes of direct buffer memory (allocated: 1073450399, limit: 1073741824)
    at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
    at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:121)
    at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:332)
    at java.base/sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:243)
    at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:293)
    at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:273)
    at java.base/sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:839)
    at java.base/sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:821)
    at org.lealone.storage.fs.impl.nio.FileNio.read(FileNio.java:60)
    at org.lealone.common.util.DataUtils.readFully(DataUtils.java:375)
    at org.lealone.storage.fs.FileStorage.readFully(FileStorage.java:470)
    at org.lealone.storage.aose.btree.chunk.Chunk.readPagePositions(Chunk.java:147)
    at org.lealone.storage.aose.btree.chunk.Chunk.read(Chunk.java:196)
    at org.lealone.storage.aose.btree.chunk.ChunkManager.readChunk(ChunkManager.java:119)
    at org.lealone.storage.aose.btree.chunk.ChunkManager.readLastChunk(ChunkManager.java:64)
    at org.lealone.storage.aose.btree.chunk.ChunkManager.init(ChunkManager.java:58)
    at org.lealone.storage.aose.btree.BTreeStorage.<init>(BTreeStorage.java:95)
    at org.lealone.storage.aose.btree.BTreeMap.<init>(BTreeMap.java:111)
    at org.lealone.storage.aose.AOStorage.openBTreeMap(AOStorage.java:84)
    at org.lealone.storage.aose.AOStorage.openMap(AOStorage.java:63)
    at org.lealone.storage.aose.AOStorage.openMap(AOStorage.java:57)
    at org.lealone.transaction.aote.AOTransaction.openMap(AOTransaction.java:140)
    at org.lealone.transaction.aote.AOTransaction.openMap(AOTransaction.java:30)
    at org.lealone.db.index.standard.StandardPrimaryIndex.<init>(StandardPrimaryIndex.java:78)
    at org.lealone.db.table.StandardTable.<init>(StandardTable.java:93)
    at org.lealone.db.schema.Schema.createTable(Schema.java:751)
    at org.lealone.sql.ddl.CreateTable.update(CreateTable.java:185)
    at org.lealone.sql.executor.YieldableLocalUpdate.executeInternal(YieldableLocalUpdate.java:23)
    at org.lealone.sql.executor.YieldableBase.run(YieldableBase.java:115)
    ... 26 more
codefollower commented 8 months ago

wms_cloud 这个数据库找不到,执行过 drop database 语句? 内存回收的策略是:超过15分钟都没访问的 page 会被回收;如果已占用内存达到 jvm 最大内存的1/3会启动 gc。

如果一直降不下来,再试试执行 checkpoint 命令。jvm 如果可用内存很多的话,jvm 进程占用的内存看上去也会很大的,只要不出现 oom 就没事的。我做性能压测的时候,如果最大内存设置 4G,此时 jvm 进程占用的内存也会一下飙到差不多4G,过了很久 jvm 才会做 gc。设置256M内存做性能压测也一样。

ContyChen commented 8 months ago

内存-Xmx1G ,没有执行drop。结束java进程再启动数据库可以正常访问

codefollower commented 8 months ago

direct buffer memory 出现 oom 的话,看看是否设置了 -XX:MaxDirectMemorySize,把它去掉或调大。错误提示是 direct buffer memory 超过 1G 了,所以 oom 了。

ContyChen commented 8 months ago

好的,我再调整试试

ContyChen commented 8 months ago

怀疑与backup有关,执行backup后 内存就不降

codefollower commented 8 months ago

那我试试 backup 看看能不能重现问题,执行 backup 时内部会先执行 checkpoint 刷脏页,然后如果一些 chunk 的数据很老了又会执行 chunk compact,这整个过程需要比较大的内存,如果当前可用内存不够了可能会 oom。

codefollower commented 8 months ago

-Xmx512m 一边执行 insert 语句一边 backup 没有重现问题。

ContyChen commented 8 months ago

backup 占用内存过大,内存释放不及时??? -Xmx512m ,连续备份两三次后出错,间隔一会儿备份就没有问题。 image

数据文件发邮件了,辛苦看一下

codefollower commented 8 months ago

看到错误了,jvm 进程占用的内存才150M就出现 Error: Out of memory,确实有点奇怪,我排查一下原因。

codefollower commented 8 months ago

如果在客户端想看到完整的异常堆栈,可以先执行 list 命令,之后执行的 sql 出错了就会打印完整的异常堆栈,再次执行 list 又会切换到简短模式。

codefollower commented 8 months ago

我找到原因了,t_6_32\c_2_2.db 这个文件有200M,然后直接读到内存中就 oom 了,我得改进一下。

codefollower commented 8 months ago

问题修复了,不过备份那个 200多M 的文件花了10几秒,ZipOutputStream 执行写操作时比较慢,跟 WinRAR 生成 zip 文件的速度差不多。

codefollower commented 8 months ago

执行 backup 需要通过 ZipOutputStream 生成 zip 文件,如果嫌弃它慢不用 backup 也是可以的。 只需要执行 checkpoint 命令,把脏页刷到硬盘后直接拷贝 db_xxx 目录到别处即可。 backup 命令的好处是支持增量备份,在 LAST DATE 指定一个日期(或日期时间)就能实现增量备份,缺点就是慢。

ContyChen commented 8 months ago

好的,我试一下