Closed glymehrvrd closed 4 years ago
Here are the heap pprofs,
File: consul
Type: inuse_space
Time: Aug 10, 2019 at 2:59pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1197.94MB, 96.43% of 1242.33MB total
Dropped 166 nodes (cum <= 6.21MB)
Showing top 10 nodes out of 40
flat flat% sum% cum cum%
845.29MB 68.04% 68.04% 1038.91MB 83.63% compress/flate.NewWriter
190.11MB 15.30% 83.34% 190.11MB 15.30% compress/flate.(*compressor).initDeflate
76.28MB 6.14% 89.48% 76.28MB 6.14% bufio.NewWriterSize
66.25MB 5.33% 94.82% 66.25MB 5.33% bufio.NewReaderSize
8.50MB 0.68% 95.50% 8.50MB 0.68% runtime.malg
5.50MB 0.44% 95.94% 8MB 0.64% net/http.readRequest
3.50MB 0.28% 96.23% 12.60MB 1.01% runtime.systemstack
1MB 0.081% 96.31% 18.01MB 1.45% net/http.(*conn).readRequest
1MB 0.08% 96.39% 1213.62MB 97.69% net/http.(*conn).serve
0.50MB 0.04% 96.43% 6.50MB 0.52% github.com/hashicorp/consul/agent/consul/fsm.restoreKV
File: consul
Type: inuse_space
Time: Aug 10, 2019 at 3:01pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1916.24MB, 96.53% of 1985.21MB total
Dropped 231 nodes (cum <= 9.93MB)
Showing top 10 nodes out of 24
flat flat% sum% cum cum%
1455.24MB 73.30% 73.30% 1777.24MB 89.52% compress/flate.NewWriter
313.49MB 15.79% 89.10% 313.49MB 15.79% compress/flate.(*compressor).initDeflate
76.27MB 3.84% 92.94% 76.27MB 3.84% bufio.NewWriterSize
60.73MB 3.06% 96.00% 60.73MB 3.06% bufio.NewReaderSize
6MB 0.3% 96.30% 11MB 0.55% net/http.readRequest
3.50MB 0.18% 96.48% 12.60MB 0.63% runtime.systemstack
1MB 0.05% 96.53% 1953.46MB 98.40% net/http.(*conn).serve
0 0% 96.53% 60.73MB 3.06% bufio.NewReader
0 0% 96.53% 322MB 16.22% compress/flate.(*compressor).init
0 0% 96.53% 1777.24MB 89.52% compress/gzip.(*Writer).Write
File: consul
Type: inuse_space
Time: Aug 10, 2019 at 3:00pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 163.85MB, 87.98% of 186.24MB total
Dropped 80 nodes (cum <= 0.93MB)
Showing top 10 nodes out of 79
flat flat% sum% cum cum%
66.75MB 35.84% 35.84% 66.75MB 35.84% bufio.NewWriterSize
61.23MB 32.88% 68.72% 61.23MB 32.88% bufio.NewReaderSize
9.70MB 5.21% 73.92% 12.36MB 6.64% compress/flate.NewWriter
8.50MB 4.57% 78.49% 8.50MB 4.57% runtime.malg
4.50MB 2.42% 80.91% 6MB 3.22% net/http.readRequest
3MB 1.61% 82.52% 4.50MB 2.42% context.propagateCancel
3MB 1.61% 84.13% 12.10MB 6.50% runtime.systemstack
2.66MB 1.43% 85.56% 2.66MB 1.43% compress/flate.(*compressor).initDeflate
2.50MB 1.34% 86.90% 2.50MB 1.34% net/http.(*Server).newConn
2MB 1.07% 87.98% 4MB 2.15% github.com/hashicorp/go-immutable-radix.(*Txn).insert
It turns out that NYTimes/gziphandler
already uses sync.Pool to cache gzip.Writer, but the cpu/mem consumption is still considerable.
The compression in http api feature was introduced by #3752, in which @mkeeler mentioned this:
Update the gziphandler creation to set the minSize before flushing to 0. I think that can be done like this:
gzipWrapper, _ := gziphandler.GzipHandlerWithOpts(gziphandler.MinSize(0)) gzipHandler := gzipWrapper(http.HandlerFunc(wrapper))
But set gziphandler.MinSize
to zero seems a bad idea, since compression small response gains less than the resource we pays. Using the default MinSize to 1400 seems a better choice.
Hey there, We wanted to check in on this request since it has been inactive for at least 60 days. If you think this is still an important issue in the latest version of Consul or its documentation please reply with a comment here which will cause it to stay open for investigation. If there is still no activity on this issue for 30 more days, we will go ahead and close it.
Feel free to check out the community forum as well! Thank you!
Hey there, This issue has been automatically closed because there hasn't been any activity for at least 90 days. If you are still experiencing problems, or still have questions, feel free to open a new one :+1:
Hey there,
This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days.
If you are still experiencing problems, or still have questions, feel free to open a new one :+1:.
When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.
Overview of the Issue
Consul consumes lots of memory(up to several GB) under heavy kv reading load. The amount of kv store is relatively small, ~9 kv pairs, and the size of
state.bin
is only 3.3 MB. So the situation is different from #866, in whichstate.bin
is much larger.By inspecting
heap.pprof
generated byconsul debug
, I found outcompress/flate.NewWriter
consumes the largest flat memory, ranging from 66MB to 1.5GB.What's more, I noticed that in Consul golang api, kv.Get adds
Accepting-Encoding: gzip
by default, which forcing Consul server to compress the response usingcompress
package.Maybe Consul should make a pool to reuse the flat.Writer, so reducing making temporary memory allocation?
Reproduction Steps
Steps to reproduce this issue, eg:
Consul info for both Client and Server
Client info
``` no Consul agent client ```Server info
``` agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 0 build: prerelease = revision = 40cec984 version = 1.5.1 consul: acl = disabled bootstrap = false known_datacenters = 1 leader = false leader_addr = 100.75.30.48:10166 server = true raft: applied_index = 49814410 commit_index = 49814410 fsm_pending = 0 last_contact = 208.144µs last_log_index = 49814410 last_log_term = 1711 last_snapshot_index = 49812505 last_snapshot_term = 1711 latest_configuration = [{Suffrage:Voter ID:19786835-c7e1-8dd7-766a-ee6f362e5354 Address:100.107.165.95:10050} {Suffrage:Voter ID:b6338dbd-5865-025e-e674-e184793aad3e Address:9.54.36.151:10078} {Suffrage:Voter ID:499cb466-0f73-41d0-e19e-7c0836f4481e Address:100.75.30.48:10166} {Suffrage:Voter ID:7e815aee-a463-bf0b-834d-b611e626af92 Address:100.95.11.157:10061} {Suffrage:Voter ID:86e51ae4-339b-1fd4-5998-a0038bc5b5ac Address:9.44.0.162:10089} {Suffrage:Voter ID:8bfcca90-14c8-a3c5-85e5-846889f0e3de Address:9.84.145.57:8300} {Suffrage:Voter ID:6d95b03c-a479-8b12-9457-8cf6ce6309f9 Address:9.84.148.217:8300} {Suffrage:Voter ID:7b182134-3961-e7d6-f07b-2170db023be2 Address:100.65.22.130:10093} {Suffrage:Voter ID:5aa0766f-30e3-165b-faf8-dc55f32761e3 Address:10.101.55.104:8300}] latest_configuration_index = 49812029 num_peers = 8 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Follower term = 1711 runtime: arch = amd64 cpu_count = 48 goroutines = 12201 max_procs = 48 os = linux version = go1.12.1 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 205 failed = 0 health_score = 0 intent_queue = 0 left = 2 member_time = 4623 members = 108 query_queue = 0 query_time = 19 serf_wan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 1 failed = 4 health_score = 0 intent_queue = 0 left = 0 member_time = 127 members = 13 query_queue = 0 query_time = 1 ```Operating system and Environment details
Linux 3.10.107
The client(not Consul agent client, but a client making kv.Get with Consul go API) talks to Consul server directly on HTTP port.
Consul is running on docker, the docker is running in host mode, so the cpu and memory showed is the total amount of the host machine. But the quota of this docker is limited to 4 cpu cores and 8 GB. I don't know will that misleading golang gc thinking we still got enough free memory while we acctually not.
Log Fragments
Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use
-log-level=TRACE
on the client and server to capture the maximum log detail.