vmware / kube-fluentd-operator

Auto-configuration of Fluentd daemon-set based on Kubernetes metadata
Other
319 stars 100 forks source link

new pod logstream not getting created with mounted-file option #289

Open vkadi opened 2 years ago

vkadi commented 2 years ago

I am seeing the issue with CW logstreams not getting created when using the "mounted-file" to ingest the logs from file in the container. I picked the sample config from examples folder and able to reproduce it , below is the demo config what I have used.

apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: default spec: selector: matchLabels: app: nginx replicas: 1 template: metadata: labels: app: nginx spec: containers:

Sometime when we refresh the fluent then we will see the logstreams but its not consistent either. Anyone using the "mounted-file" option heavily and saw this issue. Any guidance will be a great help around this area. Note : Using the latest KFO version (1.16.1)

vkadi commented 2 years ago

@Cryptophobia Can you please help with the issue , seems like the new container information is not updated in "fluent.conf" until the fluent is refreshed manually.

Cryptophobia commented 2 years ago

@vkadi , can you share the configMap or CRD fluentdconfig that you are using for the above testing container?

Are you using type @mounted-file? Are you following the README here: https://github.com/vmware/kube-fluentd-operator#i-have-a-legacy-container-that-logs-to-varloghttpdaccesslog

Also it is possible, when your configuration changes the fluentd container needs to be restarted as gracefulReload may not be enough. Let me know if you can provide the detail above and I can test with a similar configmap or CRD.

vkadi commented 2 years ago

@Cryptophobia Yes, I am using the "mounted-file" and this is how my configmap looks for the testing pod -

<source>
   @type mounted-file
   path /var/log/welcome.log
   labels msg= nginx, _container=test-container
</source>
<match **>
   @type cloudwatch_logs
   auto_create_stream true
   log_stream_name_key stream_name
   remove_log_stream_name_key true
   retention_in_days 90
   region "#{ENV['AWS_REGION']}"
   <buffer>
     @type file
     path /var/log/fluentd
     flush_interval 5s
     flush_thread_count 8
     chunk_limit_size 2m
     queued_chunks_limit_size 32
     retry_forever true
   </buffer>
   log_group_name /aws/eks/<cluster_name>/cluster/namespace/default
</match>
sabdalla80 commented 2 years ago

@Cryptophobia I am having a similar problem with mounted-file as @vkadi. I believe the problem isn't in configMap changes, this part works as expected, reloader kicks in and invokes controller::RunOnce() loop and re-generates the proper ConfigMaps in addition to fluent.conf. However, the problem seems to be when the pod in a deployment is replaced/restarted, the reloader seems to be unaware and thus it does not re-generate or update the proper fluent.conf and configmap files again and hence it never executes the mountedfile::Process() and also it won't create the cloudwatch logstream with the new pod name as a result.

Note that fluentd logs does show the new pod such as: 2021-11-17 16:51:15 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/nginx-deployment-f697bcfb-8b2jm_default_test-container-a13490af26e0e63a6eeafa884c66b6e7891835fc204fa2c259f5e3105a946496.log 2021-11-17 16:51:40 +0000 [info]: #0 detected rotation of /var/lib/kubelet/pods/601ac67a-0639-4f28-979b-cf21151257bd/volumes/kubernetes.io~empty-dir/logs/welcome.log; waiting 5 seconds 2021-11-17 16:51:41 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/nginx-deployment-f697bcfb-clqgw_default_test-container-664c91264cbc2b835b9a2ab1260b9c1c6d398f1ff419907cc5b243273b5efa97.log; waiting 5 seconds

Cryptophobia commented 2 years ago

Thank you @sabdalla80 , that description helps a lot! I think in this case maybe we are not doing it correctly. Since the config file has not change, we only force fluentd reload when config hashes change. We need to force gracefulReload on fluentd even when configmap does not change but the mountedFile pod is restarted/replaced.

This sounds like the issue here. This also explains why KFO sometimes does not work with job type pods.

Cryptophobia commented 2 years ago

I think this is related to the issues in #253

sabdalla80 commented 2 years ago

Thank you @sabdalla80 , that description helps a lot! I think in this case maybe we are not doing it correctly. Since the config file has not change, we only force fluentd reload when config hashes change. We need to force gracefulReload on fluentd even when configmap does not change but the mountedFile pod is restarted/replaced.

This sounds like the issue here. This also explains why KFO sometimes does not work with job type pods.

@Cryptophobia Thank you for the response and feedback. It sounds like others having same issue, I know this issue is currently impacting me and I can use a quick fix. Do you know if you have an ETA for the fix? I would be glad to assist as I already went down this path of learning the insides of the code and got hung up on how to detect pod changes (I am not 100% sure, but it sounds like it's done in KubeInformer). Please let me know as I can help with testing the fix as well since I already have an environment with this issue ready for testing.

Cryptophobia commented 2 years ago

@sabdalla80 , yes I think we need to find some place in kubeInformer.go (config-reloader/datasource/kube_informer.go) where we can do this check via the sharedInformers for the pods. If pods are getting restarted in a namespaces, config-reloader needs to be smart enough to issue new reload to fluentd. This is where we can get a list of pods from namespace config struct that we build.

@sabdalla80 Before we start looking at code fix can you verify if this is not a regression for me? There are a couple lines of code were deleted that were thought to be useless. Can you run the v1.15.0 docker image and verify you are seeing the same problems?

The line deleted in question is this commit: https://github.com/vmware/kube-fluentd-operator/pull/227/files

sabdalla80 commented 2 years ago

@sabdalla80 , yes I think we need to find some place in kubeInformer.go (config-reloader/datasource/kube_informer.go) where we can do this check via the sharedInformers for the pods. If pods are getting restarted in a namespaces, config-reloader needs to be smart enough to issue new reload to fluentd. This is where we can get a list of pods from namespace config struct that we build.

@sabdalla80 Before we start looking at code fix can you verify if this is not a regression for me? There are a couple lines of code were deleted that were thought to be useless. Can you run the v1.15.0 docker image and verify you are seeing the same problems?

The line deleted in question is this commit: https://github.com/vmware/kube-fluentd-operator/pull/227/files

@Cryptophobia I tried it against 1.15.0 and 1.13.0, it has the same issue.

Cryptophobia commented 2 years ago

I just looked at the code and I have an idea. We can use an hasher to get a unique has for the nsConfig struct each time RunOnce() loop runs. This will avoid us having to reload each time new pods restart or using kubeInformer at all. We just build namespace representation using our current way.

If the podlist slice representation in nsConfig.MiniContainers changes then the hash value of the whole nsConfig struct will change and we can issue reload.

We can use this for our hasher: https://github.com/mitchellh/hashstructure

EDIT: The true problem stems from the fact that mount-file names change based on the container names, but forcing more reloads will make KFO more resilient and this is the best way to solve the problem I think.

Cryptophobia commented 2 years ago

@sabdalla80, thank you for testing. After I do some local testing, I will build a new test image today.

sabdalla80 commented 2 years ago

@sabdalla80, thank you for testing. After I do some local testing, I will build a new test image today.

This would be spectacular. @Cryptophobia If you can share any code/branch you have started, I can contribute to it if it helps to bring to completion.

Cryptophobia commented 2 years ago

@sabdalla80 , When I do a simple test with 3 pods in a deployment. I cannot reproduce this. You are saying the mountedfile:Process() is not being called. Can you be more specific and in which part of the code?

sabdalla80 commented 2 years ago

@Cryptophobia Are you able to see main control loop re-run when you restart/replace a pod in the deployment? Does it re-generate fluentd.conf file?

Cryptophobia commented 2 years ago

No, the main control loop does not rerun. But I see fluentd picks up new pods and new container names when the containers restarts/scale-up or whatever. This seems like a race-condition with logstream for CW?

Cryptophobia commented 2 years ago

@sabdalla80 , do you think if we force fluentd to reload this will be fixed? This seems like CW logstreams expect to find new names of pods and it is not being forced because of race condition?

sabdalla80 commented 2 years ago

@Cryptophobia In intial deployment things work fine, the control loop generates fluentd.conf and proper CM files for a service. On pod restart, even though fluentd indeed sees the new pod, the fluend.conf and CM files are not re-generated, they become stale and don't point correctly to the new pods. To answer your question, from the behavior I have seen thus far, I do believe forcing to reload will fix the problem. As a matter of fact, I tested this theory by restarting fluentd pod manually and it seems to update things correctly. Is there a way we can code this and test it? I would have tried it if I have good understanding of the code that is initiating the detection of pod changes.

For example: this is what's generated in fluent.conf file in deployment for mounted-file pod. When mounted-file pod restarts, this file isn't changed.

#################
# Namespace pre-processing
#################
<source>
  @type tail
  path /var/lib/kubelet/pods/05fa8d22-a910-4edc-8004-6a1ca18a252e/volumes/kubernetes.io~empty-dir/logs/welcome.log
  pos_file /var/log/kfotail-23b81aaedcbe0520c93c4396c956c89b8136408d.pos
  read_from_head true
  tag kube.default.nginx-deployment-f697bcfb-ct5k5.test-container-23b81aaedcbe0520c93c4396c956c89b8136408d

  <parse>
    @type none
  </parse>
</source>

<filter kube.default.nginx-deployment-f697bcfb-ct5k5.test-container-23b81aaedcbe0520c93c4396c956c89b8136408d>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/welcome.log'; record['kubernetes']={'container_image'=>'ubuntu','container_name'=>'test-container','host'=>'ip-10-0-29-130.ec2.internal','namespace_name'=>'default','pod_id'=>'05fa8d22-a910-4edc-8004-6a1ca18a252e','pod_name'=>'nginx-deployment-f697bcfb-ct5k5'}; record['docker']={'container_id'=>'docker://fd034ffb705c6be7a6d726139622955e4d999b2d532a3497d7c97233c62ccbea'}; record['container_info']='3039a456dcf33176ead2f0e1223137d0ac6a2e3e'; record['kubernetes']['labels']={'app'=>'nginx','pod-template-hash'=>'f697bcfb'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>
Cryptophobia commented 2 years ago

@sabdalla80 , thank you for this. I think I understand the problem. The CW plugin requires that the container name be known ahead of time so even though we make correct fluentd configs from reloader when fluentd does not restart then the CW plugin configuration is not the correct one loaded. I am building a container for you to test this theory. The fix is basically to reload anytime we have new pods. I think this idea will make KFO more robust anyways. We will see in practice if this is desired.

sabdalla80 commented 2 years ago

@sabdalla80 , thank you for this. I think I understand the problem. The CW plugin requires that the container name be known ahead of time so even though we make correct fluentd configs from reloader when fluentd does not restart then the CW plugin configuration is not the correct one loaded. I am building a container for you to test this theory. The fix is basically to reload anytime we have new pods. I think this idea will make KFO more robust anyways. We will see in practice if this is desired.

It would be awesome. You are spot on in your description.

Cryptophobia commented 2 years ago

@sabdalla80 , I still can't produce a valid test with the similar preprocessing block as your config. Can you post the fluentd-config you are using for the namespace and the deployment yaml that you are deploying for this mount-file test?

Cryptophobia commented 2 years ago

Can you try testing with this image when you have some time?

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-1

EDIT: Image is still being pushed up, give it 2 mins.

Cryptophobia commented 2 years ago

@sabdalla80 , I am able to reproduce the preprocessing block now. I see when I scale up from 3 to 6 pods, the preprocessing still stay to 3 blocks...

sabdalla80 commented 2 years ago

@Cryptophobia

with vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-1

I did see reloader log this event this time around on mounted-file pod restart. But, I did not see fluentd.conf file update. Are you not running the whole chain of events such as generator::renderToDisk in your new changes?

time="2021-11-19T22:49:05Z" level=debug msg="Config hashes returned in RunOnce loop: map[abdas1-pega-58:8c3d1db1c72be76ae2299c726880b349ee7ee82d calico-system:7b0343ff7ee57ab27aef853cbfab80c49bcc9643 default:468b29a540388037bedc0ba2c9324d8e0240695b ext-secrets:8fc4450b5f01d9eb4a007f82aa1013df9a56aa1a gloo-system:155cb45bff838838d4514b6680b27a1aae5bec32 istio-system:d345dfc9f351f907c562df9fd0a537d27605b57b k8s-infra-components:6b89ba488815961fc3446e61848d09d5dc5b9f1c k8s-infra-operator:95862c8f02da78a629e30e44a1e8f4093b00a6bd kube-public:c8e51186853ef96e15db4c46ebbe0b6b5e0c4a3f kube-system:8ecffb9f57c8a0c48eeadbc36c9f247d669fc5d2 logging:71cf57b46a837f2b5f0bd616cdf12b2f245d2f0a metacontroller:0d91bcef8e1189aa3d10d0656d471268c039ec03 srs:1bedbb48a6f3eddab391c51c906e6355b4307656 tigera-operator:30b3bcb04653fc02d66c5579a320448010c942c2]"

Cryptophobia commented 2 years ago

I think I solved it @sabdalla80 !

By making a function with a channel in the Run() loop (moving it out of RunOnce()) so that it does the check now and does not rely on the KubeInformer update channels, but does it outside of the informers. This was to save replicating the code across all of the different type of informers. It's a hacky solution, but if it works, it will solve all the preprocessing race conditions problems we may have in future too. 😄

Please test when you can:

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2

Here was the solution:

    // checkIfNeedsReload will verify if new containers neeed to preprocess
    checkIfNeedsRunOnce := func() (needsRun <-chan bool) {
        needsForceRunOnce := make(chan bool, 1)
        allConfigNamespaces, _ := c.Datasource.GetNamespaces(ctx)
        for _, nsConfig := range allConfigNamespaces {
            // if there are new/restarted pods in the configed namespace, we should reload just in case:
            // mounted-file fix, mounted file path changes based on container names:
            // this may fix race condition in logstream plugins that depend on container names?
            if !util.AreStructureHashEqual(nsConfig.MiniContainers, nsConfig.MiniContainers) {
                needsForceRunOnce <- true
            } else {
                needsForceRunOnce <- false
            }
        }
        return needsForceRunOnce
    }
Cryptophobia commented 2 years ago

@sabdalla80 I will wait for your test with container before I make PR:

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2
sabdalla80 commented 2 years ago

@sabdalla80 I will wait for your test with container before I make PR:

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2

@Cryptophobia I am still seeing the same behavior. It seems as if /fluentd/etc/fluent.confand the .conf files for other namespaces remain stale when mount-file pods are replaced. Is it possible kubeInformer isn't detecting the replacement change of pods to begin with? The reason I say this is I would have expected to see controller and Generator events to show up in reloader logs when I bounce a pod. Furthermore, I have tried adding log messages in the kubeInformer functions such as discoverNamespaces() and GetNameSpaces() and I did not see them run when mounted_file pod is replaced.

In contrast to configMap::detectConfigMapName() which seems to be listening all the time, the kubeInformer::GetNamespaces() does not seem to be listening. It appears as if KubeInformer needs some listening logic similar to Configmap.go. The latter seems to have updateCh logic and the former doesn't. Could that be the issue?

Cryptophobia commented 2 years ago

@sabdalla80 , have you tried running KFO in --datasource=crd mode ? Seems to work for me in the crd mode.

sabdalla80 commented 2 years ago

@sabdalla80 , have you tried running KFO in --datasource=crd mode ? Seems to work for me in the crd mode.

@Cryptophobia No, I have not tried. I suppose I can give it a try, what changes do I need to make?

In your sample fix above where you compare the hash based on containers, I believe that may work if you cause the controller to run on pod/container restarts. The problem currently(from what I have seen) is that whatever event triggers the controller to run is not happening. It seems as if the code changes you made need to happen in the kube_informer and not at the controller?

vkadi commented 2 years ago

@sabdalla80 , have you tried running KFO in --datasource=crd mode ? Seems to work for me in the crd mode.

@Cryptophobia I have got a chance to test vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2 using CRD and I am still seeing the same issue where the Namespace pre-processing block in fluent.conf does not get updated for the new containers in the deployment until I bounce the fluent pod .

vkadi commented 2 years ago

I think I solved it @sabdalla80 !

By making a function with a channel in the Run() loop (moving it out of RunOnce()) so that it does the check now and does not rely on the KubeInformer update channels, but does it outside of the informers. This was to save replicating the code across all of the different type of informers. It's a hacky solution, but if it works, it will solve all the preprocessing race conditions problems we may have in future too. 😄

Please test when you can:

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2

Here was the solution:

  // checkIfNeedsReload will verify if new containers neeed to preprocess
  checkIfNeedsRunOnce := func() (needsRun <-chan bool) {
      needsForceRunOnce := make(chan bool, 1)
      allConfigNamespaces, _ := c.Datasource.GetNamespaces(ctx)
      for _, nsConfig := range allConfigNamespaces {
          // if there are new/restarted pods in the configed namespace, we should reload just in case:
          // mounted-file fix, mounted file path changes based on container names:
          // this may fix race condition in logstream plugins that depend on container names?
          if !util.AreStructureHashEqual(nsConfig.MiniContainers, nsConfig.MiniContainers) {
              needsForceRunOnce <- true
          } else {
              needsForceRunOnce <- false
          }
      }
      return needsForceRunOnce
  }

@Cryptophobia ```if !util.AreStructureHashEqual(nsConfig.MiniContainers, nsConfig.MiniContainers) {
needsForceRunOnce <- true
} else {
needsForceRunOnce <- false
}``` you are intentionally comparing same obj's here? Seems a typo ... 

Also, I tried to upscale the pods and with that also I am seeing the same behaviour , can you please brief the steps how your are testing this.

sabdalla80 commented 2 years ago

@Cryptophobia I believe I have a fix that seems to resolve the pod restarts issue. I will do some quick testing on my end first, then I will submit the changes in a branch. It would be awesome if you can spend few minutes to review it and/or recommend a more elegant way if applicable as you see fit. The gist of it is to create informer for Pods resources and attaching handler to the informer, and updating the channel. It looks something like this. Obviously I will extract the handler to it's own function and handle the Add and Delete functions in the same fashion...

 . . .
 //Insert change here at line #255:
  //begin sam change ....
   factory.Core().V1().Pods().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
     AddFunc: func(new interface{}) {
        logrus.Infof("pods AddFunc !!")
     },
     UpdateFunc: func(old, new interface{}) {
        logrus.Infof("pods UpdateFunc !!")
        select {
        case updateChan <- time.Now():
        default:
        },
    DeleteFunc: func(new interface{}) {
        logrus.Infof("pods DeleteFunc")
      },
    },
  //end sam change ....

 factory.Start(nil)
 if !cache.WaitForCacheSync(nil,
    factory.Core().V1().Namespaces().Informer().HasSynced,
    factory.Core().V1().Pods().Informer().HasSynced,
    factory.Core().V1().ConfigMaps().Informer().HasSynced,
    kubeds.IsReady) {
    return nil, fmt.Errorf("Failed to sync local informer with upstream Kubernetes API")
}
Cryptophobia commented 2 years ago

@sabdalla80 , that may work but I am trying to avoid this. The best way is to use the podlister and make a decision for each namespace individually. We don't want to track changes in all pods and namespaces, that will make KFO use even more memory. We only care if pods changed for namespace we track...

sabdalla80 commented 2 years ago

@sabdalla80 , that may work but I am trying to avoid this. The best way is to use the podlister and make a decision for each namespace individually. We don't want to track changes in all pods and namespaces, that will make KFO use even more memory. We only care if pods changed for namespace we track...

@Cryptophobia Can you elaborate? Is the memory concern with the Pods informer? If I understand this correctly, no matter where the comparison fix is placed, it needs to have updated info about Pods. In your previous fix, you compared hashes in Controller RunOnce function, but I believe you assumed it has an updated set of pods/containers. As I found out later, Run() never executed on pod changes. Can you point me to how we can ensure PodLister gets updated? Would it work if we make PodLister a member of the ConfigMapDS struct? I apologize for my ignorance in Golang and in how some of these modules are connected together.

Cryptophobia commented 2 years ago

Okay @sabdalla80 , thank you very much for the informer event code! Indeed, this was what was missing and I was trying to solve it in a way around informers but that would not work. I have added your code in the kubeInformer and merged it with my code, and it works. However, I am not going to be satisfied until you test the new container.

    factory.Core().V1().Pods().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: func(new interface{}) {
            select {
            case updateChan <- time.Now():
            default:
            }
        },
        UpdateFunc: func(old, new interface{}) {
        },
        DeleteFunc: func(new interface{}) {
            select {
            case updateChan <- time.Now():
            default:
            }
        },
    })

I have modified your code to make it only capture the AddFunc and DeleteFunc events rather than every single update to pods. I think this will work since we only care to run preProcessors when pod IDs change and that only happens when pods are added or delete? Once a pod update happens we verify new config hashes and then if there are pod changes in namespaces we care about, we verify if those pod list structure hashes have changed...then we issue reloads and the processors should have ran before when we render the new configs...

Can you give this container a test? vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-3

Cryptophobia commented 2 years ago

I have tested with CRD types and it works. It issues the reload if pods scale up and down and verifies if miniContainerHash has changed for the container structures. If miniContainerHash has changed from previous, then it issues reload commands.

Here are some logs:

time="2021-11-24T03:51:54Z" level=info msg="Discovering only namespaces that have fluentdconfig crd defined."
time="2021-11-24T03:51:54Z" level=debug msg="Returned these namespaces for fluentdconfig crds: [default]"
time="2021-11-24T03:51:54Z" level=debug msg="loaded config data from fluentdconfig: default/fd-config"
time="2021-11-24T03:51:55Z" level=debug msg="Checked config for namespace default with fluentd and got: "
time="2021-11-24T03:51:55Z" level=debug msg="Saving status annotation to namespace default: <nil>"
time="2021-11-24T03:51:55Z" level=debug msg="Config hashes returned in RunOnce loop: map[default:653e508269f377a453cf793111433cd606696c7e]"
time="2021-11-24T03:51:55Z" level=debug msg="Previous MiniContainer hash for ns default is 1651905141176401475"
time="2021-11-24T03:51:55Z" level=debug msg="New MiniContainer hash for ns default is 9477722725180817073"
time="2021-11-24T03:51:55Z" level=debug msg="Reloading fluentd configuration gracefully via POST to /api/config.gracefulReload"
time="2021-11-24T04:06:21Z" level=info msg="Running main control loop"
time="2021-11-24T04:06:21Z" level=info msg="Discovering only namespaces that have fluentdconfig crd defined."
time="2021-11-24T04:06:21Z" level=debug msg="Returned these namespaces for fluentdconfig crds: [default]"
time="2021-11-24T04:06:21Z" level=debug msg="loaded config data from fluentdconfig: default/fd-config"
time="2021-11-24T04:06:25Z" level=debug msg="Checked config for namespace default with fluentd and got: "
time="2021-11-24T04:06:25Z" level=debug msg="Saving status annotation to namespace default: <nil>"
time="2021-11-24T04:06:25Z" level=debug msg="Config hashes returned in RunOnce loop: map[default:4d1aa5093ad0b5e58a238cd56bbf9d38acb2bd83]"
time="2021-11-24T04:06:25Z" level=debug msg="Previous MiniContainer hash for ns default is 9477722725180817073"
time="2021-11-24T04:06:25Z" level=debug msg="New MiniContainer hash for ns default is 17378916774148613530"
time="2021-11-24T04:06:25Z" level=debug msg="Reloading fluentd configuration gracefully via POST to /api/config.gracefulReload"
sabdalla80 commented 2 years ago

It sounds great!! I will test it and report back

sabdalla80 commented 2 years ago

@Cryptophobia Great news! I have verified that sabdalla80-test-3 image worked successfully for adding, updating, and deleting pods. Do you plan on merging to master today? I am looking forward to the new image. Thanks again for your help!

Cryptophobia commented 2 years ago

@sabdalla80, Great news! Tomorrow, I will make and publish a new minor release v1.16.2

Cryptophobia commented 2 years ago

@sabdalla80 , apologies to bother you today. I had a realization this morning that we do not need to compare hashes of all pods and store those at all. We can just rely on config hashes changing per namespace, since preProcessors are part of renderToDisk function the config for each namespace will include the preprocessing directives.

I removed a bunch of useless code I had and now it is a much leaner solution (no more hashing structures of pods that may not be related to any configs).

Can you please just verify that it all works when you scale up and down with configmaps type? I have verified crd type works again. Would appreciate a second pair of eyes on this before making the PR and release.

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-4
sabdalla80 commented 2 years ago

Certainly! I Will do shortly and will report back.


From: AMO ❤️☕ @.> Sent: Wednesday, November 24, 2021 1:01:34 PM To: vmware/kube-fluentd-operator @.> Cc: Sam A. @.>; Mention @.> Subject: Re: [vmware/kube-fluentd-operator] new pod logstream not getting created with mounted-file option (Issue #289)

@sabdalla80https://github.com/sabdalla80 , apologies to bother you today. I had a realization this morning that we do not need to compare hashes of all pods and store those at all. We can just rely on config hashes changing per namespace, since preProcessors are part of renderToDisk function the config for each namespace will include the preprocessing directives.

I removed a bunch of useless code I had and now it is a much leaner solution (no more hashing structures of pods that may not be related to any configs).

Can you please just verify that it all works when you scale up and down with configmaps type? I have verified crd type works again. Would appreciate a second pair of eyes on this before making the PR and release.

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-4

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/vmware/kube-fluentd-operator/issues/289#issuecomment-978107109, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUNLXHHKQDDYDV7NFGJTXTUNUR75ANCNFSM5ICHBFKA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sabdalla80 commented 2 years ago

@Cryptophobia I like the cleanup - looks good. I have also verified sabdalla80-test-4 works correctly with scaling up and down of mounted_file pods as well as replacing/updating the pods.

Cryptophobia commented 2 years ago

New v1.16.2 release is out. New image is now docker.io/vmware/kube-fluentd-operator:v1.16.2

Cryptophobia commented 2 years ago

Hey @sabdalla80,

Unfortunately, this needs to be reverted for now. We noticed that too many gracefulReload calls are causing the fluentd plugins to break and fluentd gets stuck. Fluentd sits in a bad state so in very huge clusters with lots of configs this is a problem.

Screen Shot 2021-12-03 at 5 33 21 PM

I think fluentd just gets in a bad state if too many gracefulReloads come in all at once. I have no other explanation. Which means we need to be way smarter when we decide to reload. We need to be even more deliberate… We can’t just reload all the time when new pods come in and out… we will need to somehow keep track if those pods are in the preprocessing directives, if they are, then reload… I will most likely need to modify preprocessor function to return actual pod names and container ids of the pods that are in the preprocessing config. That way the controller will track them. Then when pods come in only reload if that pod matches the previous config's container ids.

I will go back to the drawing board and look at other ways of fixing this.

sabdalla80 commented 2 years ago

@Cryptophobia Sorry about the late response, this got lost in my email box somehow. I would like to collaborate with you on the fix as needed. Getting this right is critical to us. Besides the increase in reloads, how did you find out that fluentd was getting hung? Ideally, on the update front, we don't want to reload everytime a pod changes, we just want to reload only if a mounted_file pod has been updated.

Cryptophobia commented 2 years ago

Besides the increase in reloads, how did you find out that fluentd was getting hung?

Log volumes got really low (see picture above) and when inspecting the fluentd containers, we saw that we don't get normal debug/info logs from fluentd but rather it was stuck after the gracefulReload API call. Upon further inspection of the logs in the fluentd container, it looks like fluentd is taking a lot of processing power to release the buffers/load/unload plugins and then becomes stuck in that data pipeline flush code (suspecting this is the case). Fluentd cannot keep up with all of the API calls to reload when using too many plugins and too many configs. It could also be that it tries to flush too many file buffers from our fluentd configuration files per namespaces... (another possibility).

Ideally, on the update front, we don't want to reload every time a pod changes, we just want to reload only if a mounted_file pod has been updated.

Yes, this is exactly the point. We need a better solution that only reloads when mount_file pods are updated. Only call the update event in the pod reloader when a container that has mount_file directive (in preprocessors) to regenerate configs. This will lower the number of reload calls to the absolute requirement to make it all work without DOS on fluentd pod's API.

sabdalla80 commented 2 years ago

@Cryptophobia I am going to start working on it today. I am going to figure out how to reload only when mounted_file pod restarted. If you already thought about how the code may look like, please share. I will also need your help when fix is in so you can run the same tests you ran to ensure fluentd is not getting hung with too many reload calls.

sabdalla80 commented 2 years ago

@Cryptophobia Can you provide more input on the behavior you saw, I was not able to reproduce. I deployed a mounted-file pod and a standard pod (not mounted-file), and I observed the standard pod does not actually execute graceful reload calls (but it does go thru the RunOnce() controller function). The mounted-file pod does execute graceful reloads on pod changes per the fix. Related to extra consumption/processing and for the standard(non-mounted) pods, I am wondering if it's due to the unnecessary processing of the renderToDisk() function within RunOnce() loop? Ideally for non-mounted pods, we don't want to run controller loop at all. Do you know if there is a way we can flag for that right in the kubeInformer module at the time when we detect pod changes? In other words, in KuberInformer, are we able to find the namespace config associated with the pod that just changed and based on the mounted-file type, we can run updateChan?

On a side note: On ConfigMap changes or Pod changes(now), RunOnce() control loop seems to scan all namespaces/configs everytime(I think this should be an enhancement where it only runs against the namespace that changed)

sabdalla80 commented 2 years ago

@Cryptophobia I made some changes so that we only run controller and reload when pod changes is associated with namespace/config that has mounted-file type. I am doing this in kube_informer and not in controller to avoid the RenderToDisk() call. The PR branch is here: https://github.com/vmware/kube-fluentd-operator/pull/306 (Thank you @kadiv1). Could you please test the changes by running your metrics tests from last time to see if log volumes are ok and there are no extra graceful reloads? If this works and does the trick, we can think of more elegant solution if need be.

I appreciate your help on this to help unblock me as I am not able to proceed forward until I verify there are no issues with v16.2

sabdalla80 commented 2 years ago

@Cryptophobia Can you help reviewing the above changes?

javiercri commented 2 years ago

Current solution in 1.16.5 is not affordable for big clusters. Issue revert in 1.16.6, we will need to think in a better approach.