Customize envoy CLI arguments --envoy-flag

jakubdyszkiewicz commented 3 years ago

Summary

Right now there is no way to customize Envoy CLI arguments. We can introduce --envoy-flag real_envoy_flag=value to kuma-dp which will be then used when running Envoy

bhiravabhatla commented 3 years ago

@jakubdyszkiewicz Would like to take this up. I am new to kuma, would take sometime to go through the code base. I hope its ok.

jakubdyszkiewicz commented 3 years ago

Awesome, go ahead

bhiravabhatla commented 3 years ago

@jakubdyszkiewicz have a question :

I see we already hard code few envoy args when starting envoy proxy - https://github.com/kumahq/kuma/blob/97ebe14a3d795c53c9ab43214616c4921116bf57/app/kuma-dp/pkg/dataplane/envoy/envoy.go#L152-L165 Can we just append the args received in cli flags to this slice.
If yes, how do we handle a scenario if some one passes an argument which is already part of hardcoded []args slice. For example, configFile or LogLevel.
One option would be to skip those flags if passed - i.e dont add them to []args slice.
Another option would be to override the existing args.

Kindly share your thoughts

bartsmykla commented 3 years ago

Hmm, I think we have to decide on order of precedence. I would vote to treat --envoy* flags as more important, but maybe if there is a clash, we could log a warning? Wdyt @jakubdyszkiewicz

jakubdyszkiewicz commented 3 years ago

Definitely, the option that --envoy* overrides the one that we generate. I don't think the warning is needed.

bhiravabhatla commented 3 years ago

Cool. Then if some one tries to overide config file - we would need a check if the config file exists or not before adding it to args - correct?

jakubdyszkiewicz commented 3 years ago

I'd implement it in the following way.

Construct map[string]string of default args
Add keys and values of the overrides (it will replace existing entries or create a new one if arg does not exist)
Build slice of args from the map

subnetmarco commented 3 years ago

@bhiravabhatla any updates on this issue?

subnetmarco commented 3 years ago

Actually, I wonder the following. Since we need a system that works in both Universal and Kubernetes, wouldn't we want kuma-cp to store the CLI settings the data plane proxies and then pass those settings prior to invoking envoy? In this way, it would work consistently both when kuma-dp is automatically injected and also when we manually start it in Universal.

I am suggesting adding the following:

apiVersion: kuma.io/v1alpha1
kind: DataplaneStartupFlags
mesh: default
metadata:
  name: custom-cli-conf-1
spec:
  selectors:
    - match:
        kuma.io/service: '*'
  conf:
    envoy:
      concurrency: XYZ
      ..
    # OPA agent could be another option, etc..

Since kuma-dp connects to kuma-cp first and foremost to retrieve the bootstrap configuration, even prior to initializing envoy itself, couldn't kuma-dp fetch the list of CLI arguments that we need to inject dynamically in the envoy execution?

This has the added benefit that it could potentially configure the CLI arguments of any other process we may decide to start alongside envoy.

Note: Of course it's called DataplaneStartupFlags because these arguments will only be available at startup time, and any change would require a manual restart of kuma-dp.

jpeach commented 3 years ago

IIUC some flags are set by the mesh, so need to think carefully about which we expose. I like the idea of the DataplaneStartupFlags resource since we can expose only the flags that aren't controlled by the mesh.

jpeach commented 3 years ago

Self assigning since I plan to do some research on this.

Untagging "good first issue" until we rescope.

jpeach commented 3 years ago

Flag	Purpose	Could be Settable by Operators
--enable-core-dump	Enable core dumps	Y
--socket-mode	Hot restart socket file permission	N (Kuma doesn't support hot restarts)
--socket-path	Hot restart socket path	N (Kuma doesn't support hot restarts)
--disable-extensions	Comma-separated list of extensions to disable	Y (good for testing)
--cpuset-threads	Get the default # of worker threads from cpuset size	Y
--enable-mutex-tracing	Enable mutex contention tracing functionality	Y (only useful for Envoy dev)
--disable-hot-restart	Disable hot restart functionality	N (Kuma doesn't suport hot restarts)
--mode	One of 'serve' (default; validate configs and then serve traffic normally) or 'validate' (validate configs and exit).	N
--parent-shutdown-time-s	Hot restart parent shutdown time in seconds	N (Kuma doesn't suport hot restarts)
--drain-strategy	Hot restart drain sequence behaviour, one of 'gradual' (default) or 'immediate'.	Y (Kuma should manage draining)
--drain-time-s	Hot restart and LDS removal drain time in seconds	Y (Kuma should manage draining)
--file-flush-interval-msec	Interval for log flushing in msec	Y (But Kuma owns logging config)
--service-zone	Zone name	N (Managed by Kuma)
--service-node	Node name	N (Managed by Kuma)
--service-cluster	Cluster name	N (Managed by Kuma)
--hot-restart-version	hot restart compatibility version	N (Kuma doesn't support hot restart)
--restart-epoch	hot restart epoch #	N (Kuma doesn't support hot restart)
--log-path	Path to logfile	N
--enable-fine-grain-logging	enable file level log control(Fancy Logger)or not	Y
--log-format-escaped	Escape c-style escape sequences in the application logs	Y (Kuma should own this)
--log-format	Log message format in spdlog syntax	Y (Kuma should own this)
--component-log-level	Comma separated list of component log levels	Y (Kuma should own this)
--log-level	Log levels: [trace][debug][info][warning	Y (Kuma should own this)
--local-address-ip-version	The local IP address version (v4 or v6).	N
--admin-address-path	Admin address path	N
--ignore-unknown-dynamic-fields	ignore unknown fields in dynamic configuration	N (Kuma manages config)
--reject-unknown-dynamic-fields	reject unknown fields in dynamic configuration	N (Kuma manages config)
--allow-unknown-static-fields	allow unknown fields in static configuration	N (Kuma manages config)
--allow-unknown-fields allow	unknown fields in static configuration (DEPRECATED)	N (Kuma manages config)
--bootstrap-version	API version to parse the bootstrap config as.	N (Kuma manages bootstrapping)
--config-yaml	Inline YAML configuration, merges with the contents of --config-path	Y (but it's a footgun)
--config-path	Path to configuration file	N (Kuma manages config)
--concurrency	# of worker threads to run	Y (Kuma ought to set this correctly)
--base-id-path	path to which the base ID is written	N (Kuma doesn't support hot restart)
--use-dynamic-base-id	the server chooses a base ID dynamically	N (Kuma doesn't support hot restart)
--base-id	base ID so that multiple envoys can run on the same host if needed	N (Kuma doesn't support hot restart)
--version	Displays version information and exits.	N
--help	Displays usage information and exits.	N
--ignore_rest	Ignores the rest of the labeled arguments following this flag.	N (only useful for custom envoy builds?)

So looking at the table above

Concurrency should clearly be settable. Kuma ought to be able to automatically make a good choice, especially for sidecar injection, but operators need to be able to set this in Universal an override it. Needs to accept additional values "default" and "auto" for forward compatibility.
Setting drain policy flags seems very reasonable, though I think that Kuma should be managing traffic training automatically. That would be a larger task that needs some design, so these flags might help operators in the interim, provided they don't prevent us making improvements in future.
Various logging flags can usefully be set, but Kuma owns logging in general, so I'd like to double check this against other logging support.
The debugging and developer flags a unlikely to be useful in general, but are also harmless to set (as far as Kuma management goes).
Maybe we should unconditionally be setting --cpuset-threads?

In addition to the flags resource, I'm considering whether it makes sense for kuma-dp to have dedicated flags for --concurrency and --log-level. Those are very common and especially likely to be customized per-instance or per-pod.

For flags that should not be settable

Lots of flags related to hot restarting should never be settable unless Kuma supports hot restarts.
--config-yaml is a bit maybe. Could be useful but also a footgun.

jpeach commented 3 years ago

Bringing back some out of band discussion

Kuma owns and depends on some of these flags, so allowing them to be set arbitrarily is risky for users
The number of flags that users can reasonably want to set is quite low, and maybe not enough to justify the work involved in creating a new Kuma policy
Drain strategies are "gradual" and "immediate". For mesh sidecars, "immediate" seems like a better overall default (we don't have millions of connections to drain).
Kuma should probably set concurrency automatically, or allow it to be directly configured. In the former case, Kuma already sets the sidecar resources in the Kubernetes runtime, so it makes sense to make the concurrency. In the universal runtime, a way for users to directly set the concurrency seems very useful. Kubernetes users could also propagate the concurrency from a pod label.
Kuma should probably do connection draining automatically, but this would need some scoping and discussion to understand what is needed

subnetmarco commented 3 years ago

This could also be properties of the existing Mesh resource, which would then apply to every envoy belonging to a specific mesh.

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

jakubdyszkiewicz commented 2 years ago

Triage: we should handle concurrency, draining and logging if we don't already. It requires design how to implement this.

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

github-actions[bot] commented 1 year ago