Closed vividvilla closed 7 months ago
It's not a good idea to increase the write timeout to this level. The configuration is causing some issues.
What version of the server, and what does the config look like for the cross account setup?
Could it be that there is a deadlock in the leafnodes that cause messages to accumulate and not being drained? @vividvilla would it be possible for you to re-run the tests with the nightly image to see if that was fixed already (synadia/nats-server:nightly
) or if not, enable the monitoring port on each server, and when you notice the memory increase (that should be happening seeing that the hub failed to write more than 1.5 GB), then get http://host:<monitoring port>/stacksz
captures at say 5 or 10 seconds interval on leaf nodes servers? This would help see if there is a deadlock on those servers.
(for that specific test, you may actually even want to increase the write_deadline even more just to have enough time to see the memory increase and then the ability to capture stacks)
It's not a good idea to increase the write timeout to this level. The configuration is causing some issues.
@derekcollison I have observed the same even when write_deadline
was set to default 10s
.
What version of the server, and what does the config look like for the cross account setup?
Leafnodes uses 2.8.1
and hub cluster where this error was logged runs on 2.6.4
. Here is the config.
server_name: "hub-cluster-node-1"
max_payload: 1000000
max_pending: 1000000
http_port: 8222
write_deadline: "60s"
ping_max: 3
cluster {
name: hub-cluster
listen: 0.0.0.0:4223
routes: [
nats-route://x.x.x.x:4223
nats-route://y.y.y.y:4223
]
}
jetstream {
store_dir: "/home/ubuntu/nats/data"
max_mem: 15G
max_file: 300G
domain: primary
}
include ./leaf.conf
include ./accounts.conf
accounts {
# Admin account.
SYS: {
users: [{user: admin, password: admin}]
},
# Primary cluster account.
ACC: {
users: [{user: acc, password: acc}],
jetstream: enabled,
exports: [
# Export JS service and delivery stream.
{stream: 'deliver.acc.primary.>'}
{service: '$JS.primary.API.>'}
{service: '$JS.FC.>'}
{service: '$JS.ACK.>'}
],
imports: [
{stream: {account: BRIDGE, subject: 'test.>'}}
# JS and delivery stream.
{stream: {account: BRIDGE, subject: 'deliver.silo.>'}}
{service: {account: BRIDGE, subject: '$JS.silo.>'}}
# JS flow control.
{service: {account: BRIDGE, subject: '$JS.FC.>'}}
{service: {account: BRIDGE, subject: '$JS.ACK.>'}}
]
},
# Account used in silo.
SILO: {
users: [{user: silo, password: silo}],
jetstream: enabled,
exports: [
{stream: 'test.>'}
# Delivery topics.
{stream: 'deliver.silo.silo_a.>'}
{stream: 'deliver.silo.silo_b.>'}
{stream: 'deliver.silo.silo_c.>'}
{stream: 'deliver.silo.silo_d.>'}
# JS services for all silo.
{service: '$JS.silo_a.API.>'}
{service: '$JS.silo_b.API.>'}
{service: '$JS.silo_c.API.>'}
{service: '$JS.silo_d.API.>'}
# JS flow control.
{service: '$JS.FC.>'}
{service: '$JS.ACK.>'}
]
},
# Bridge account which acts as a firewall between both SILO and AWS.
BRIDGE: {
users: [{user: bridge, password: bridge}],
exports: [
{stream: 'test.>'}
# JS and delivery stream.
{stream: 'deliver.>'}
{service: '$JS.>'}
],
imports: [
{stream: {account: SILO, subject: 'test.>'}}
# Delivery streams.
{stream: {account: ACC, subject: 'deliver.acc.primary.>'}}
{stream: {account: SILO, subject: 'deliver.silo.silo_a.>'}}
{stream: {account: SILO, subject: 'deliver.silo.silo_b.>'}}
{stream: {account: SILO, subject: 'deliver.silo.silo_c.>'}}
{stream: {account: SILO, subject: 'deliver.silo.silo_d.>'}}
# JS service. Renamed it to namespace based on account to avoid cyclic imports.
{service: {account: ACC, subject: '$JS.primary.API.>'}, to: '$JS.acc.primary.API.>'}
{service: {account: SILO, subject: '$JS.silo_a.API.>'}, to: '$JS.silo.silo_a.API.>'}
{service: {account: SILO, subject: '$JS.silo_b.API.>'}, to: '$JS.silo.silo_b.API.>'}
{service: {account: SILO, subject: '$JS.silo_c.API.>'}, to: '$JS.silo.silo_c.API.>'}
{service: {account: SILO, subject: '$JS.silo_d.API.>'}, to: '$JS.silo.silo_d.API.>'}
# Flow control.
{service: {account: SILO, subject: '$JS.FC.>'}}
{service: {account: SILO, subject: '$JS.ACK.>'}}
]
}
}
system_account: SYS
leafnodes {
remotes = [
# Silo A
{
urls: ["tls://admin:admin@silo-a:4333"]
account: "SYS"
},
{
urls: ["tls://bridge:bridge@silo-a:4333"]
account: "BRIDGE"
},
# Silo B
{
urls: ["tls://admin:admin@silo-b:4333"]
account: "SYS"
},
{
urls: ["tls://bridge:bridge@silo-b:4333"]
account: "BRIDGE"
},
# Silo C
{
urls: ["tls://admin:admin@silo-c:4333"]
account: "SYS"
},
{
urls: ["tls://bridge:bridge@silo-c:4333"]
account: "BRIDGE"
},
# Silo D
{
urls: ["tls://admin:admin@silo-d:4333"]
account: "SYS"
},
{
urls: ["tls://bridge:bridge@silo-d:4333"]
account: "BRIDGE"
}
]
}
Could it be that there is a deadlock in the leafnodes that cause messages to accumulate and not being drained? @vividvilla would it be possible for you to re-run the tests with the nightly image to see if that was fixed already (
synadia/nats-server:nightly
) or if not, enable the monitoring port on each server, and when you notice the memory increase (that should be happening seeing that the hub failed to write more than 1.5 GB), then gethttp://host:<monitoring port>/stacksz
captures at say 5 or 10 seconds interval on leaf nodes servers? This would help see if there is a deadlock on those servers. (for that specific test, you may actually even want to increase the write_deadline even more just to have enough time to see the memory increase and then the ability to capture stacks)
@kozlovic will monitor stats tomorrow and let you know. If possible I will upgrade to nightly after collecting stats.
Am I reading it right that you increased max_payload, meaning how large a single message can be to 100MB? Do you send messages that big?
@derekcollison 99.99% messages are in few kbs and max we send is 5MB which is just handful of times and not all the time.
I had increased write_deadline
to 180s
to all leaf nodes and hub nodes. Here is the observation
[22493] 2022/05/04 09:40:58.744123 [INF] 123.108.38.133:4333 - lid:40 - Slow Consumer Detected: WriteDeadline of 3m0s exceeded with 49359 chunks of 3234791305 total bytes.
[22493] 2022/05/04 09:40:58.745226 [INF] 123.108.38.133:4333 - lid:40 - Slow Consumer Detected: WriteDeadline of 3m0s exceeded with 95618 chunks of 6266372121 total bytes.
[22493] 2022/05/04 09:40:58.745240 [INF] 123.108.38.133:4333 - lid:40 - Leafnode connection closed: Slow Consumer (Write Deadline) account: NEST_BRIDGE
[22493] 2022/05/04 09:40:59.758921 [INF] 123.108.38.132:4333 - lid:62 - Leafnode connection created for account: NEST_BRIDGE
Adding these stats to periscope so we can visualize what's going on the cluster. I have captured stack for around 10s on both hub and silo node. Will share that as well.
Here is the memory stats of hub nodes (was able to collect only 2 out of 3 nodes).
@vividvilla I don't see the stacks attached, which are the ones that will help us see if there is a deadlock happening in the leafnodes, causing the memory buildup in the hub. Please attach them at your earliest convenience.
@vividvilla would it be possible to get the stack dumps as @kozlovic mentioned so we can analyze to see if there is a deadlock?
We are nearing a release for 2.8.2.
Also just noticed that the hub is running 2.6.4, possible to upgrade that system to nightly or 2.8.1/2.8.2?
@kozlovic here is the captured stack for every few seconds. Have done it for both hub and leaf node - https://gofile.io/d/4Sk8Rg Mirror - https://send.bitwarden.com/#5nvdFz61ek6Hoa6LAUqTHA/HpKPY3g3V03GaopYu-WX3g
This was captured with write_deadline
set to 60s
since it was crashing with 180s
. If these stack are not helpful then I can set it to 120s
or something and capture it again.
@derekcollison I have upgraded hub nodes to v2.8.1
yesterday so the issue happened with v2.8.1
as well.
Update: changed the file download URL.
@vividvilla The "good" news, is that there is no deadlock, but servers are really busy doing this kind of operations:
goroutine 129130 [runnable]:
bytes.(*Buffer).WriteString(0xc0024a9140, {0xc001b36690, 0x2f})
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/bytes/buffer.go:180 +0xfa
net/http.Header.writeSubset(0xa0fe80, {0xb0f280, 0xc0024a9140}, 0x11, 0x0)
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/net/http/header.go:199 +0x484
net/http.Header.write(...)
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/net/http/header.go:87
net/http.Header.Write(...)
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/net/http/header.go:83
github.com/nats-io/nats-server/v2/server.(*client).setHeader(0xc00112c000, {0xa4432d, 0x11}, {0xc001b36690, 0x2f}, {0xc0008ddbc8, 0x2c0, 0x438})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3795 +0x225
github.com/nats-io/nats-server/v2/server.(*client).processServiceImport(0xc00112c000, 0xc0001a0ea0, 0xc00022a240, {0xc0008ddbc8, 0x2c0, 0x438})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3965 +0xa26
github.com/nats-io/nats-server/v2/server.(*Account).addServiceImportSub.func1(0xc0010c0770, 0x40cc74, 0x7f16e79f4f18, {0xc0000dc305, 0xc0010c07a8}, {0x697851, 0x101010000980220}, {0xc0008ddbc8, 0x2c0, 0x438})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1935 +0x32
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg(0xc00112c000, 0xc00002f200, 0x5, {0xc0006236a0, 0x41482f, 0x10}, {0x0, 0x7f16c10c2fff, 0x0}, {0xc00112d480, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3194 +0xc4d
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults(0xc00112c000, 0xc00022a240, 0xc000d9be30, {0xc0008ddbc8, 0x2c0, 0x438}, {0x0, 0x0, 0x0}, {0xc0006236a0, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4189 +0xa75
github.com/nats-io/nats-server/v2/server.(*client).processServiceImport(0xc00112c000, 0xc0001a0bd0, 0xc00022a240, {0xc0008ddbc8, 0x2c0, 0x438})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4005 +0xed1
github.com/nats-io/nats-server/v2/server.(*Account).addServiceImportSub.func1(0x0, 0x939557, 0xc0000d221e, {0x19, 0x1}, {0x0, 0x2b2}, {0xc0008ddbc8, 0x2c0, 0x438})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1935 +0x32
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg(0xc00112c000, 0xc00002ee40, 0x40, {0xc00171d740, 0xc00046e4c0, 0x40}, {0x0, 0x995d40, 0x0}, {0xc00112d480, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3194 +0xc4d
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults(0xc00112c000, 0x0, 0xc000d9be30, {0xc0008ddbc8, 0x2c0, 0x438}, {0x0, 0x0, 0x0}, {0xc0008ddbb5, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4189 +0xa75
github.com/nats-io/nats-server/v2/server.(*client).processInboundLeafMsg(0xc00112c000, {0xc0008ddbc8, 0xc0004a68c0, 0x438})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:2231 +0x3a5
github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg(0xc00112c000, {0xc0008ddbc8, 0x11, 0xc8d})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3516 +0x51
github.com/nats-io/nats-server/v2/server.(*client).parse(0xc00112c000, {0xc0008dc000, 0x2000, 0x4000})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x246a
github.com/nats-io/nats-server/v2/server.(*client).readLoop(0xc00112c000, {0x0, 0x0, 0x0})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1229 +0xe1f
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func1()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:932 +0x29
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3013 +0x87
or
goroutine 129789 [runnable]:
encoding/json.Marshal({0x9a2700, 0xc0012cc000})
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/encoding/json/encode.go:158 +0x127
github.com/nats-io/nats-server/v2/server.(*client).processServiceImport(0xc00022f300, 0xc0001a0bd0, 0xc00022a240, {0xc0008211a4, 0x225, 0x2e5c})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3964 +0x9b9
github.com/nats-io/nats-server/v2/server.(*Account).addServiceImportSub.func1(0xc00109d230, 0x0, 0xc00109d240, {0x41642f, 0x764a09}, {0x403651, 0xc00109d248}, {0xc0008211a4, 0x225, 0x2e5c})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1935 +0x32
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg(0xc00022f300, 0xc00002ee40, 0xa80, {0xc00171d7c0, 0xffffffffffffffff, 0x40}, {0x0, 0x41642f, 0x0}, {0xc000230781, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3194 +0xc4d
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults(0xc00022f300, 0x0, 0xc000d9be30, {0xc0008211a4, 0x225, 0x2e5c}, {0x0, 0x0, 0x0}, {0xc000821194, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4189 +0xa75
github.com/nats-io/nats-server/v2/server.(*client).processInboundLeafMsg(0xc00022f300, {0xc0008211a4, 0xc0006a2c40, 0x2e5c})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:2231 +0x3a5
github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg(0xc00022f300, {0xc0008211a4, 0xe, 0x2e6c})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3516 +0x51
github.com/nats-io/nats-server/v2/server.(*client).parse(0xc00022f300, {0xc000820000, 0x13c9, 0x1000})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x246a
github.com/nats-io/nats-server/v2/server.(*client).readLoop(0xc00022f300, {0x0, 0x0, 0x0})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1229 +0xe1f
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func1()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:932 +0x29
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3013 +0x87
or that...
goroutine 127118 [runnable]:
sync.(*Pool).pin(0xe897e0)
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/sync/pool.go:195 +0x65
sync.(*Pool).Get(0xe897e0)
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/sync/pool.go:128 +0x25
encoding/json.newEncodeState()
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/encoding/json/encode.go:305 +0x25
encoding/json.Marshal({0x9a2700, 0xc00118a3c0})
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/encoding/json/encode.go:159 +0x27
github.com/nats-io/nats-server/v2/server.(*client).processServiceImport(0xc000230c80, 0xc0001a0ea0, 0xc00022a240, {0xc001476c8c, 0x264, 0x7374})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3964 +0x9b9
github.com/nats-io/nats-server/v2/server.(*Account).addServiceImportSub.func1(0xc0010c0770, 0x40cc74, 0x7f16e79f4f18, {0xc0000dc305, 0xc0010c07a8}, {0x697851, 0x101010000980220}, {0xc001476c8c, 0x264, 0x7374})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1935 +0x32
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg(0xc000230c80, 0xc00002f200, 0x5, {0xc001aa8bc0, 0x17, 0x10}, {0x0, 0x0, 0x0}, {0xc000232100, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3194 +0xc4d
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults(0xc000230c80, 0x400, 0xc000d9be30, {0xc001476c8c, 0x264, 0x7374}, {0x0, 0x0, 0x0}, {0xc001aa8bc0, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4189 +0xa75
github.com/nats-io/nats-server/v2/server.(*client).processServiceImport(0xc000230c80, 0xc0001a0bd0, 0xc00022a240, {0xc001476c8c, 0x264, 0x7374})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4005 +0xed1
github.com/nats-io/nats-server/v2/server.(*Account).addServiceImportSub.func1(0xc0010c1248, 0x939557, 0xc001c46a5e, {0x19, 0x22}, {0xc00049e300, 0xc0010c1248}, {0xc001476c8c, 0x264, 0x7374})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1935 +0x32
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg(0xc000230c80, 0xc00002ee40, 0x14, {0xc0014def40, 0xffffffffffffffff, 0x40}, {0x0, 0x995d40, 0x0}, {0xc000232100, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3194 +0xc4d
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults(0xc000230c80, 0x0, 0xc000d9be30, {0xc001476c8c, 0x264, 0x7374}, {0x0, 0x0, 0x0}, {0xc001476c79, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4189 +0xa75
github.com/nats-io/nats-server/v2/server.(*client).processInboundLeafMsg(0xc000230c80, {0xc001476c8c, 0xc0001de2c0, 0x7374})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:2231 +0x3a5
github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg(0xc000230c80, {0xc001476c8c, 0x11, 0x4e3331})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3516 +0x51
github.com/nats-io/nats-server/v2/server.(*client).parse(0xc000230c80, {0xc001476000, 0x1f29, 0x1000})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x246a
github.com/nats-io/nats-server/v2/server.(*client).readLoop(0xc000230c80, {0x0, 0x0, 0x0})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1229 +0xe1f
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func1()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:932 +0x29
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3013 +0x87
Which means processing service imports across different accounts and either decoding or encoding client JSON information or having to set headers to the messages.. so nothing "wrong" per-se, just I guess this is becoming a bottle neck.
At this point, let's see if @derekcollison or @matthiashanel (out sick currently) can have a look at the topology and see if that is required to get what you want to achieve. If so, then we will have to see how that processing can be improved. If not, maybe some changes in the topology will not lead to this performance issue..
We have released 2.8.2, so please upgrade to that as well. We will take a deeper look at the issues above, and thanks for the updates!
@kozlovic @derekcollison thanks for the quick response. I will upgrade the cluster to v2.8.2 and see if it helps. Will wait for further updates from you.
After giving some thought we ditched the cross account setup in favor of prefixing messages pushed from silo with the silo name, this helped avoiding duplicates in merged stream and stops data flow from one silo to another silo.
Here is the memory graph after we did this change.
Memory dropped from constant 10-13GB to 300MB and now there is no delay on pull consumer fetch and acks. Will keep this issue opened and if there is any updates I will be happy to help with the testing.
Thanks, we will continue to investigate.
We no longer default to full client state encoding for system level imports, e.g. JetStream. This should help quite a bit in this case and there are more optimizations we can make here as well.
@vividvilla Let us know if the issue has been resolved. If not, please test with the latest server 2.10.12 (at the time of this comment) and re-open with new details.
Context and setup
Observation
60s
but still observing this when traffic flowing through is really high.Here is the logs.
Since without cross account this doesn't seems to be happening it doesn't seems like a network limitation or some sort. So is there way to debug this further? without cross account setup we will end up sending multiple GBs of data across the silos and we did this cross account import to restrict the traffic flow from silo to hub and not the other way around.
I have posted full leaf node config and accounts config in previous issue I have created.