networkservicemesh / deployments-k8s

Apache License 2.0
42 stars 34 forks source link

Change log level on the fly without restart #12296

Closed szvincze closed 2 hours ago

szvincze commented 1 month ago

Overview

Currently the log level can be configured via environment variable NSM_LOG_LEVEL which means the process will use the configured value for its whole lifetime. It is fine in most of the cases, but when one would like to change the log level because closer monitoring is needed, it would be expected to be able to do the change without restarting the process that would be monitored. In the vast majority of the cases the restart makes impossible the immediate monitoring action, because after the configuration change we should initiate restart and then just wait for the issue to happen again.

For temporary setting it would be good if the log level could be changed via sending IPC signals to the process. For example, if we would like to monitor forwarder-vpp then we send SIGUSR1 to the process and it changes its log level to TRACE, when we captured the needed logs the original log level can be restored by sending signal SIGUSR2.

arp-est commented 1 month ago

Hi, We also ha a reference implementation, if this signalling approach is okay with you guys

https://github.com/networkservicemesh/cmd-forwarder-vpp/pull/1171

Ex4amp1e commented 1 month ago

Possible solution:

  1. Add term "NSM config"
  2. Move LOG_LEVEL configuring from ENV config to NSM config
  3. [Optional] In future - move other ENVs

Algorithm for LOG_LEVEL:

  1. Every cmd repo will contain structure, that we will read data from file
  2. On every request there will be check if the file has been modified, in positive case - read new value and update LOG_LEVEL in structure
  3. In scope k8s deployment - we will store data in configmap and mount it into each pod as file
denis-tingaikin commented 1 month ago

cc @edwarnicke

arp-est commented 3 weeks ago

Hi, did the prs for the cmds, here's a list: https://github.com/networkservicemesh/cmd-cluster-info-k8s/pull/157 https://github.com/networkservicemesh/cmd-forwarder-ovs/pull/420 https://github.com/networkservicemesh/cmd-forwarder-sriov/pull/778 https://github.com/networkservicemesh/cmd-ipam-vl3/pull/221 https://github.com/networkservicemesh/cmd-lb-vl3-vpp/pull/107 https://github.com/networkservicemesh/cmd-map-ip-k8s/pull/246 https://github.com/networkservicemesh/cmd-nsc-init/pull/785 https://github.com/networkservicemesh/cmd-nsc/pull/655 https://github.com/networkservicemesh/cmd-nsc-simple-docker/pull/256 https://github.com/networkservicemesh/cmd-nsc-vpp/pull/723 https://github.com/networkservicemesh/cmd-nse-firewall-vpp/pull/541 https://github.com/networkservicemesh/cmd-nse-icmp-responder/pull/621 https://github.com/networkservicemesh/cmd-nse-icmp-responder-vpp/pull/705 https://github.com/networkservicemesh/cmd-nse-l7-proxy/pull/226 https://github.com/networkservicemesh/cmd-nse-remote-vlan/pull/249 https://github.com/networkservicemesh/cmd-nse-simple-vl3-docker/pull/271 https://github.com/networkservicemesh/cmd-nse-vfio/pull/538 https://github.com/networkservicemesh/cmd-nse-vl3-vpp/pull/368 https://github.com/networkservicemesh/cmd-nsmgr-proxy/pull/523 https://github.com/networkservicemesh/cmd-nsmgr/pull/710 https://github.com/networkservicemesh/cmd-registry-memory/pull/689 https://github.com/networkservicemesh/cmd-registry-proxy-dns/pull/670

I have some problems with the prs below, the reference to the sdk is old there, how do you guys usually update the versions?

https://github.com/networkservicemesh/cmd-exclude-prefixes-k8s/pull/319 https://github.com/networkservicemesh/cmd-nse-supplier-k8s/pull/355 https://github.com/networkservicemesh/cmd-registry-k8s/pull/481

I think these are all the cmds where the log level change would be applicable.

denis-tingaikin commented 3 weeks ago

Hello @arp-est ,

Many thanks for doing this.

I have some problems with the prs below, the reference to the sdk is old there, how do you guys usually update the versions?

Yeah, we have faced a problem in SDK K8s, and those components were not updated. Now that the problem is resolved, could you please rebase these PRs?

arp-est commented 3 weeks ago

Hello @arp-est ,

Many thanks for doing this.

I have some problems with the prs below, the reference to the sdk is old there, how do you guys usually update the versions?

Yeah, we have faced a problem in SDK K8s, and those components were not updated. Now that the problem is resolved, could you please rebase these PRs?

Right away

arp-est commented 3 weeks ago

Hi, I rebased it but in those prs its not the sdk-k8s, but the regular sdk, that is missing the SetupLevelChangeOnSignal function, I see only the version of sdk-k8s was updated in the go.mod file.

denis-tingaikin commented 2 hours ago

It seems like it's done.

Many thanks, @arp-est.

Created a ticket to add an integration test for this feature: https://github.com/networkservicemesh/deployments-k8s/issues/12522.

Closing this one.