Closed fangdingjun closed 1 year ago
may be this is caused by continue to call hook methods after the hook.Stop() called. after I comment out the h.db.Close on hook.Stop(), the badger not crashed, but the db.Close never called.
may be we should wait all tasks done before we stop the hook.
@fangdingjun Just see if this change #299 can fix it, and if it is, I think I should know the reason.
I apply this change to v2.3.0 and add some log like this
if cl.StopCause() == packets.ErrServerShuttingDown {
h.Log.Error().Msg("OnDisconnect, server is shutting down...")
return
}
this change has effect, but not fix the issue, still crash on same place.
I noticed there are many errors like this
Writes are blocked, possibly due to DropAll or Close
maybe when we call h.db.Close, the prevous h.db.Delete is stll in progress and cause the problem
here is the log
9:11AM INF mochi mqtt starting version=2.3.0
9:11AM INF mochi mqtt server started
^C9:11AM INF stopping hook hook=allow-all-auth
9:11AM INF stopping hook hook=badger-db
9:11AM ERR failed to delete client data error="Writes are blocked, possibly due to DropAll or Close" data=conn_1fa50dfc8124445bba hook=badger-db
9:11AM ERR failed to upsert client data error="Writes are blocked, possibly due to DropAll or Close" data={"clean":true,"id":"conn_dde13fe3fa4d4d8b81","listener":"t1","properties":{"authenticationData":null,"authenticationMethod":"","maximumPacketSize":0,"receiveMaximum":0,"requestProblemInfo":0,"requestProblemInfoFlag":false,"requestResponseInfo":0,"sessionExpiryInterval":0,"sessionExpiryIntervalFlag":false,"topicAliasMaximum":0,"user":null},"protocolVersion":4,"remote":"192.168.56.1:53190","t":"CL","username":null,"will":{"flag":0,"payload":null,"qos":0,"retain":false,"topicName":"","user":null,"willDelayInterval":0}} hook=badger-db
9:11AM ERR failed to upsert client data error="Writes are blocked, possibly due to DropAll or Close" data={"clean":true,"id":"conn_0467947652764efe88","listener":"t1","properties":{"authenticationData":null,"authenticationMethod":"","maximumPacketSize":0,"receiveMaximum":0,"requestProblemInfo":0,"requestProblemInfoFlag":false,"requestResponseInfo":0,"sessionExpiryInterval":0,"sessionExpiryIntervalFlag":false,"topicAliasMaximum":0,"user":null},"protocolVersion":4,"remote":"192.168.56.1:53192","t":"CL","username":null,"will":{"flag":0,"payload":null,"qos":0,"retain":false,"topicName":"","user":null,"willDelayInterval":0}} hook=badger-db
9:11AM ERR failed to upsert client data error="Writes are blocked, possibly due to DropAll or Close" data={"clean":true,"id":"conn_7d717531c23e455daf","listener":"t1","properties":{"authenticationData":null,"authenticationMethod":"","maximumPacketSize":0,"receiveMaximum":0,"requestProblemInfo":0,"requestProblemInfoFlag":false,"requestResponseInfo":0,"sessionExpiryInterval":0,"sessionExpiryIntervalFlag":false,"topicAliasMaximum":0,"user":null},"protocolVersion":4,"remote":"192.168.56.1:53191","t":"CL","username":null,"will":{"flag":0,"payload":null,"qos":0,"retain":false,"topicName":"","user":null,"willDelayInterval":0}} hook=badger-db
9:11AM ERR failed to upsert client data error="Writes are blocked, possibly due to DropAll or Close" data={"clean":true,"id":"conn_4c0f947f19a442e79c","listener":"t1","properties":{"authenticationData":null,"authenticationMethod":"","maximumPacketSize":0,"receiveMaximum":0,"requestProblemInfo":0,"requestProblemInfoFlag":false,"requestResponseInfo":0,"sessionExpiryInterval":0,"sessionExpiryIntervalFlag":false,"topicAliasMaximum":0,"user":null},"protocolVersion":4,"remote":"192.168.56.1:53193","t":"CL","username":null,"will":{"flag":0,"payload":null,"qos":0,"retain":false,"topicName":"","user":null,"willDelayInterval":0}} hook=badger-db
9:11AM ERR Ondisconnect, server is shutting down... hook=badger-db
9:11AM ERR OnUnsubscribed, server is shutting down... hook=badger-db
9:11AM ERR Ondisconnect, server is shutting down... hook=badger-db
9:11AM ERR OnUnsubscribed, server is shutting down... hook=badger-db
9:11AM ERR Ondisconnect, server is shutting down... hook=badger-db
9:11AM ERR OnUnsubscribed, server is shutting down... hook=badger-db
9:11AM ERR Ondisconnect, server is shutting down... hook=badger-db
9:11AM ERR OnUnsubscribed, server is shutting down... hook=badger-db
9:11AM ERR Ondisconnect, server is shutting down... hook=badger-db
9:11AM ERR OnUnsubscribed, server is shutting down... hook=badger-db
9:11AM ERR Ondisconnect, server is shutting down... hook=badger-db
9:11AM ERR OnUnsubscribed, server is shutting down... hook=badger-db
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x865125]
goroutine 2457 [running]:
github.com/dgraph-io/badger/skl.(*Skiplist).IncrRef(...)
/home/dingjun/source/go/pkg/mod/github.com/dgraph-io/badger@v1.6.0/skl/skl.go:86
github.com/dgraph-io/badger.(*DB).getMemTables(0xc0001d6380)
/home/dingjun/source/go/pkg/mod/github.com/dgraph-io/badger@v1.6.0/db.go:481 +0xe5
github.com/dgraph-io/badger.(*DB).get(0xc0001d6380, {0xc008dd1440, 0x2c, 0x2c})
/home/dingjun/source/go/pkg/mod/github.com/dgraph-io/badger@v1.6.0/db.go:513 +0x7d
github.com/dgraph-io/badger.(*Txn).Get(0xc008a9dc80, {0xc008dd1410, 0x24, 0x30})
/home/dingjun/source/go/pkg/mod/github.com/dgraph-io/badger@v1.6.0/txn.go:407 +0x247
github.com/timshannon/badgerhold.(*Store).TxDelete(0x88b07a?, 0xc0000b4be0?, {0x9155e0, 0xc008cf8e30}, {0x9415c0, 0xc008cd5000})
/home/dingjun/source/go/pkg/mod/github.com/timshannon/badgerhold@v1.0.0/delete.go:32 +0x118
github.com/timshannon/badgerhold.(*Store).Delete.func1(0xc0001d6380?)
/home/dingjun/source/go/pkg/mod/github.com/timshannon/badgerhold@v1.0.0/delete.go:17 +0x33
github.com/dgraph-io/badger.(*DB).Update(0xc00004ac00?, 0xc008ca7320)
/home/dingjun/source/go/pkg/mod/github.com/dgraph-io/badger@v1.6.0/txn.go:696 +0x83
github.com/timshannon/badgerhold.(*Store).Delete(0xc008c1042c?, {0x9155e0?, 0xc008cf8e30?}, {0x9415c0?, 0xc008cd5000?})
/home/dingjun/source/go/pkg/mod/github.com/timshannon/badgerhold@v1.0.0/delete.go:16 +0x6f
github.com/mochi-mqtt/server/v2/hooks/storage/badger.(*Hook).OnDisconnect(0xc0001c4cc0, 0xc00865ac80, {0x0?, 0x0?}, 0x1)
/home/dingjun/source/go/src/github.com/mochi-mqtt/server/hooks/storage/badger/badger.go:201 +0x186
github.com/mochi-mqtt/server/v2.(*Hooks).OnDisconnect(0xc0000a23c0?, 0x9c2c45?, {0x0, 0x0}, 0x0?)
/home/dingjun/source/go/src/github.com/mochi-mqtt/server/hooks.go:257 +0x13c
github.com/mochi-mqtt/server/v2.(*Server).attachClient(0xc0000b45f0, 0xc00865ac80, {0x9bec2d, 0x2})
/home/dingjun/source/go/src/github.com/mochi-mqtt/server/server.go:381 +0x7c5
github.com/mochi-mqtt/server/v2.(*Server).EstablishConnection(0x0?, {0x9bec2d, 0x2}, {0xa8dbe0?, 0xc008afe4b0?})
/home/dingjun/source/go/src/github.com/mochi-mqtt/server/server.go:311 +0x65
github.com/mochi-mqtt/server/v2/listeners.(*TCP).Serve.func1()
/home/dingjun/source/go/src/github.com/mochi-mqtt/server/listeners/tcp.go:85 +0x47
created by github.com/mochi-mqtt/server/v2/listeners.(*TCP).Serve
/home/dingjun/source/go/src/github.com/mochi-mqtt/server/listeners/tcp.go:84 +0x145
@fangdingjun I made some changes in #299 , see if they are effective.
with latest commit, work fine with fresh start, the server not crash on close.
but with prevous commit's badger data store, the server can not be closed, can only be ended by kill -9, I don't know if I do something wrong.
@fangdingjun you mean the 'crash on close' issue has been resolved, but now the program cannot exit?
@fangdingjun I conducted some tests:
However, I did not encounter the issue of the program being unable to exit. Perhaps you can provide more details about your testing, preferably with steps that can reproduce this issue.
I can reproduce it by this step
if the data store not closed properly, with latest commit the server.Close unable to exit
@fangdingjun Thanks, I was able to reproduce this issue, and I've made some changes #299 to fix it. Can you please check if it's effective?
with latest commit, the "crash on close" issue is back.
@fangdingjun Could you please provide the crash logs? I merged my changes into both v2.4.0 and v2.3.0, and I tested them using the steps you mentioned, and everything was OK. Did you merge my changes into your v2.3.0 codebase? Is there anything missing?
You can directly pull my repository at werbenhu/mqserver for testing.
If you want to test the scenario after the crash in v2.3.0, just copy the 'cmd/data' directory from your crashed v2.3.0 into the 'cmd' directory of v2.4.0.
In the context of the crash you experienced in v2.3.0, I suggest checking if the port you are currently listening on is still in use. You can add the following check in your main.go file.
err := server.AddListener(listeners.NewTCP("t1", ":1883", nil))
if err != nil {
panic(err)
}
Resolved in v2.4.1, sorry for the delay!
Hi,
I use badger to store the data, when I try to restart the server, sometimes crashed.
I make a demo that can reproduce the issue.
test step
start the server
start some clients connect to server and keep sending messages I use jmeter with mqtt plugin, 10 threads, infinite loop
press CTRL+C to close the server
after ctrl+c the server crashed and print the stack
version
this is the demo code
this is the trace