hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.27k stars 4.41k forks source link

Optimization of Consul service #12218

Open AnilChoudhury-Eaton opened 2 years ago

AnilChoudhury-Eaton commented 2 years ago

Please search the existing issues for relevant feature requests, and use the reaction feature (https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to add upvotes to pre-existing requests.

Feature Description

We are using Consul as a micro service with EdgeX foundry. We observe that Consul takes a lot of memory in RAM and flash. It takes too much time to start (up to 5 mins) on A7 core, when there are ~10 services connected to it.

We are looking ways to optimize this service based on features needed during compilation. Can we get feature options to enable/disable during build process?

Use Case(s)

EdgeX foundry uses Consul as Regisry service. Our HW architecture: Beaglebone black (Single A7 core,512 MB RAM), STM32MP1 (dual A7 core, 512 MB RAM) etc.

If this can't be optimized, do you have any other alternative suggestions?

dnephin commented 2 years ago

Thanks for raising this issue! I have some notes in #9074 that may be of interest to you. I guess the compression is not helpful, but the other changes may be. We haven't done much work to optimize for this use case. I suspect there are some easy wins to improve things, but I'm not sure by how much.

There are also some runtime configuration settings that can disable some components. I understand build-time flags would be preferable, but in some cases it may be difficult to use a build-time flag because of how the code is structured. Some examples are disable_coordinates, disable_update_check, and disable_check_output. You may also want to look at tuning the gossip_lan settings to make it less chatty, although as it says in those docs, that is an advanced configuration.

We should be able to accept pull requests that make changes along these lines as long as it doesn't add significant complexity. It would be great to discuss any changes first either in this issue or as an early draft PR.

If you are able to get a CPU profile of the startup, or a heap profile of the memory usage, or provide a log with timestamps, we might be able to better identify areas that need optimization.

AnilChoudhury-Eaton commented 2 years ago

Thank you. We will look into these suggestions and will update you back in few weeks.

AnilChoudhury-Eaton commented 2 years ago

Hi @dnephin , I have tried option to optimize the consul binary size but its not reducing beyond 69MB. I am using 1.6.0 consul, is the "Shrink the binary #9074" patch still applicable?

Our total device firmware binary size if 120MB and only consul is 69MB, so its very huge.

dnephin commented 2 years ago

Let's keep this issue open to track this specific use case.

How did you get to the 69Mb? Was that by using -ldflags="-s -w" ?

The changes I made can be seen here: https://github.com/hashicorp/consul/compare/dnephin/deprecate-go-discover, but they were never ready to be merged. It was just a quick experiment. Also I think the code has changed sufficiently that there will be too many conflicts.

I think we'll need to start over, but the previous change gives us some idea of what to look at. Basically what I did was add a build tag tiny that would remove the go-discover dependency, and the bindata file used for the UI.

AnilChoudhury-Eaton commented 2 years ago

Hi Daniel, Thank you for your input. I am novice to go world of Go and Consul. After trimming out debug flags in a similar manner -ldflags="-s -w", the size drops to the 69MB.

We wanted to drop it below 5MB as we are looking for firmware images <50MB. So if possible, please look for ways to limit its features at compile time from a single device perspective (edge and not a cluster).

We are also facing huge size increase on consul data directory size very fast. We are running consul in this mode: /usr/bin/consul agent -server -ui -node=server-1 -bootstrap-expect=1 -config-dir /etc/consul/config/ -data-dir /etc/consul/data/ -client 0.0.0.0 -bind 0.0.0.0 &

We found that in 6-7 hours the ' /etc/consul/data/' size increased from 9,672 bytes to 29,931 bytes. We are running it on a single edge node only. Should we remove the the following flags: '-server' and '-bootstrap-expect=1' to avoid making it a raft leader and avoid run it a a server and instead run it in agent mode?

dnephin commented 2 years ago

Should we remove the the following flags: '-server' and '-bootstrap-expect=1' to avoid making it a raft leader and avoid run it a a server and instead run it in agent mode?

Yes, I think you want it to be a client agent, not a server.

AnilChoudhury-Eaton commented 2 years ago

I had tried by that but observes error messages from consul. But will try again. We are looking to run it as a edge agent where different micro services can push their configuration data and read it. Also it can act as a service watcher. Standard job of Registry & Configuration in EdgeX.

Thank you.

max19931 commented 2 years ago

Why not switch over to serf and run it instead of consul.

Consul uses serf as a building Block.

It would massively reduce the amount of resources required.