kubescape / node-agent

Kubescape eBPF agent 🥷🏻
https://kubescape.io/
Apache License 2.0
8 stars 5 forks source link

watch running containers #229

Closed dwertent closed 8 months ago

dwertent commented 8 months ago

User description

Overview


Type

enhancement, bug_fix


Description


Changes walkthrough

Relevant files
Enhancement
16 files
main.go
Enhance Object and Rule Binding Cache Management and Shutdown
Procedure

main.go
  • Introduced objectcachev1 and rulebindingcachev1 for enhanced cache
    management.
  • Added ruleBindingNotify channel to manage rule binding notifications.
  • Integrated ruleBindingNotify with containerwatcher to observe
    container events based on rule bindings.
  • Ensured dWatcher stops upon application shutdown.
  • +18/-8   
    applicationprofile_manager.go
    Improve Application Profile Manager with Better Error Handling and
    Container Tracking

    pkg/applicationprofilemanager/v1/applicationprofile_manager.go
  • Added tracking for removed containers to prevent race conditions
    during delete operations.
  • Improved error handling for syscall retrieval to ignore "no syscall
    found" errors.
  • Removed unnecessary logging for nil syscall peek function.
  • +9/-3     
    container_watcher.go
    Refactor Container Watcher to Support Rule-based Container Monitoring

    pkg/containerwatcher/v1/container_watcher.go
  • Added ruleBindingPodNotify to manage container monitoring based on
    rule bindings.
  • Introduced timeBasedContainers and ruleManagedContainers sets for
    container tracking.
  • Adjusted container monitoring logic to differentiate between
    time-based and rule-managed containers.
  • +16/-1   
    container_watcher_private.go
    Enhance Container Callback and Monitoring Logic for Rule-based
    Operations

    pkg/containerwatcher/v1/container_watcher_private.go
  • Enhanced container callback to support rule-based container
    monitoring.
  • Implemented addRunningContainers to monitor containers based on rule
    bindings.
  • Adjusted container unregister logic to consider rule-managed
    containers.
  • +68/-29 
    network_manager.go
    Improve Network Manager with Enhanced Container Tracking 

    pkg/networkmanager/network_manager.go
  • Introduced removedContainers and trackedContainers sets for better
    container lifecycle management.
  • Improved waitForContainer method to handle removed containers.
  • +29/-8   
    applicationprofilecache.go
    Optimize Application Profile Caching Logic                             

    pkg/objectcache/applicationprofilecache/applicationprofilecache.go
  • Optimized application profile caching logic for added and deleted
    pods.
  • Ensured application profiles are properly cached and cleaned up.
  • +16/-31 
    k8scache.go
    Initialize PodSpec Map in Kubernetes Object Cache               

    pkg/objectcache/k8scache/k8scache.go
  • Initialized podSpec map to store Kubernetes object specifications.
  • +1/-0     
    networkneighborscache.go
    Optimize Network Neighbors Caching Logic                                 

    pkg/objectcache/networkneighborscache/networkneighborscache.go
  • Optimized network neighbors caching logic for added and deleted pods.
  • Ensured network neighbors are properly cached and cleaned up.
  • +14/-25 
    objectcache_interface.go
    Introduce NewObjectCacheMock Function                                       

    pkg/objectcache/objectcache_interface.go
  • Introduced NewObjectCacheMock function to create a mock object cache.
  • +3/-0     
    relevancy_manager.go
    Improve Relevancy Manager with Better Container Lifecycle Management

    pkg/relevancymanager/v1/relevancy_manager.go
  • Added tracking for removed containers to prevent race conditions.
  • Improved container callback to handle container removal more
    gracefully.
  • +13/-4   
    cache.go
    Enhance Rule Binding Cache with Notification Support         

    pkg/rulebindingmanager/cache/cache.go
  • Added support for notifying about rule binding changes.
  • Enhanced rule binding and pod association logic.
  • +32/-0   
    helpers.go
    Add Helper Function for Name Conversion in Rule Binding Cache

    pkg/rulebindingmanager/cache/helpers.go
  • Added helper function to convert unique names to namespace and name.
  • +7/-0     
    notifier.go
    Introduce Rule Binding Notification Mechanism                       

    pkg/rulebindingmanager/notifier.go
  • Introduced RuleBindingNotify structure and related functions to manage
    rule binding notifications.
  • +46/-0   
    rulebindingmanager_interface.go
    Add Notifier Method to RuleBindingCache Interface               

    pkg/rulebindingmanager/rulebindingmanager_interface.go - Added `AddNotifier` method to the `RuleBindingCache` interface.
    +4/-1     
    rulebindingmanager_mock.go
    Introduce NewRuleBindingCacheMock Function and Fix Mock Behavior

    pkg/rulebindingmanager/rulebindingmanager_mock.go
  • Introduced NewRuleBindingCacheMock function.
  • Fixed IsCached method to return true for better mock behavior.
  • +6/-1     
    rule_manager.go
    Adjust Rule Manager Caching Check for Efficiency                 

    pkg/rulemanager/v1/rule_manager.go
  • Adjusted caching check to use K8sObjectCache directly.
  • Commented out redundant rule binding cache check.
  • +5/-5     
    Tests
    1 files
    open_test.go
    Fix Container Watcher Test Initialization                               

    pkg/containerwatcher/v1/open_test.go
  • Fixed the CreateIGContainerWatcher call in tests by adding the missing
    ruleBindingPodNotify parameter.
  • +1/-1     
    Bug_fix
    1 files
    watch.go
    Adjust Dynamic Watcher Start Method and Fix Storage Object Handling

    pkg/watcher/dynamicwatcher/watch.go
  • Adjusted Start method to not return an error.
  • Fixed handling of existing storage objects to prevent data races.
  • +5/-3     

    PR-Agent usage: Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    codiumai-pr-agent-free[bot] commented 8 months ago

    PR Description updated to latest commit (https://github.com/kubescape/node-agent/commit/884c611a4648b03fd6b79afb8f38a137a816a4fd)

    codiumai-pr-agent-free[bot] commented 8 months ago

    PR Review

    ⏱️ Estimated effort to review [1-5] 4, because the PR introduces significant changes across multiple components including cache management, container lifecycle management, and rule binding notifications. The complexity of these changes, their interactions, and the potential for subtle bugs or performance issues require a thorough review.
    🧪 Relevant tests No
    🔍 Possible issues Possible Bug: The `deleteResources` method in `ApplicationProfileManager` and `NetworkManager` now adds the container ID to a `removedContainers` set but does not check if the container is in this set before processing it in other methods. This could lead to scenarios where a deleted container is still being processed.
    Performance Concern: The addition of channels for rule binding notifications (`ruleBindingNotify`) in multiple components could lead to increased memory usage and potential goroutine leaks if not properly managed and cleaned up.
    Logical Issue: In `container_watcher.go`, the logic to determine whether a container is new or pre-existing might not correctly handle all cases, especially in dynamic environments where containers can be quickly created and destroyed.
    Consistency Issue: The handling of errors and logging within the new code is inconsistent. For example, some errors are logged at an error level, while others are logged at an info level without a clear rationale for the difference.
    🔒 Security concerns No

    ✨ Review tool usage guide:
    **Overview:** The `review` tool scans the PR code changes, and generates a PR review which includes several types of feedbacks, such as possible PR issues, security threats and relevant test in the PR. More feedbacks can be [added](https://pr-agent-docs.codium.ai/tools/review/#general-configurations) by configuring the tool. The tool can be triggered [automatically](https://pr-agent-docs.codium.ai/usage-guide/automations_and_usage/#github-app-automatic-tools-when-a-new-pr-is-opened) every time a new PR is opened, or can be invoked manually by commenting on any PR. - When commenting, to edit [configurations](https://github.com/Codium-ai/pr-agent/blob/main/pr_agent/settings/configuration.toml#L23) related to the review tool (`pr_reviewer` section), use the following template: ``` /review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=... ``` - With a [configuration file](https://pr-agent-docs.codium.ai/usage-guide/configuration_options/), use the following template: ``` [pr_reviewer] some_config1=... some_config2=... ``` See the review [usage page](https://pr-agent-docs.codium.ai/tools/review/) for a comprehensive guide on using this tool.
    codiumai-pr-agent-free[bot] commented 8 months ago

    PR Code Suggestions

    CategorySuggestions                                                                                                                                                       
    Enhancement
    Improve error handling when retrieving running containers. ___ **Consider using a more specific error handling instead of logging and continuing execution
    when k8sClient.GetRunningContainers(pod) fails. This could lead to unexpected behavior if
    there are issues retrieving running containers.** [pkg/containerwatcher/v1/container_watcher_private.go [113]](https://github.com/kubescape/node-agent/pull/229/files#diff-6f95b4caa6090a17da5aed1923600fd049392d228e0fba99cc212f48111f3ffeR113-R113) ```diff -containers := k8sClient.GetRunningContainers(pod) +containers, err := k8sClient.GetRunningContainers(pod) +if err != nil { + logger.L().Error("Failed to get running containers", helpers.Error(err)) + return fmt.Errorf("failed to get running containers: %w", err) +} ```
    Handle syscall peek function errors more explicitly. ___ **To avoid silently ignoring errors, consider handling the error returned by syscallPeekFunc
    more explicitly, especially for cases other than "no syscall found".** [pkg/applicationprofilemanager/v1/applicationprofile_manager.go [237]](https://github.com/kubescape/node-agent/pull/229/files#diff-fc815317651e17975c117749e7661127dbcde82fd9d4d36ebc76cb5b09b3c54eR237-R237) ```diff -if err != nil && !strings.Contains(err.Error(), "no syscall found") { +if err != nil { + if !strings.Contains(err.Error(), "no syscall found") { + logger.L().Ctx(ctx).Error("ApplicationProfileManager - failed to get syscalls", helpers.Error(err), + helpers.String("slug", slug), + helpers.Int("container index", watchedContainer.ContainerIndex)) + } + return +} ```
    Add a timeout to the waitForContainer method to prevent infinite waiting. ___ **Ensure that the waitForContainer method handles the case where a container is not found
    within a reasonable timeout to prevent infinite waiting.** [pkg/networkmanager/network_manager.go [625-630]](https://github.com/kubescape/node-agent/pull/229/files#diff-cbcd69c2d2fb59b909f8b3cec3ff32348a1ddb4f1d465ece79ead0fa0ba7168bR625-R630) ```diff +backOff := backoff.NewExponentialBackOff() +backOff.MaxElapsedTime = 2 * time.Minute return backoff.Retry(func() error { if am.trackedContainers.Contains(k8sContainerID) { return nil } return fmt.Errorf("container %s not found", k8sContainerID) -}, backoff.NewExponentialBackOff()) +}, backOff) ```
    Add error handling for unregisterContainer call within time.AfterFunc. ___ **Consider checking for errors when calling ch.unregisterContainer(notif.Container) inside
    the time.AfterFunc to ensure proper error handling and logging.** [pkg/containerwatcher/v1/container_watcher_private.go [40-42]](https://github.com/kubescape/node-agent/pull/229/files#diff-6f95b4caa6090a17da5aed1923600fd049392d228e0fba99cc212f48111f3ffeR40-R42) ```diff time.AfterFunc(ch.cfg.MaxSniffingTime, func() { ch.timeBasedContainers.Remove(notif.Container.Runtime.ContainerID) - ch.unregisterContainer(notif.Container) + if err := ch.unregisterContainer(notif.Container); err != nil { + logger.L().Error("Failed to unregister container", helpers.Error(err)) + } }) ```
    Check the result of adding a pod name to the set for success or failure. ___ **Consider checking the result of ap.slugToPods.Get(uniqueSlug).Add(podName) for success or
    failure. This will ensure that the pod name was successfully added to the set.** [pkg/objectcache/applicationprofilecache/applicationprofilecache.go [163]](https://github.com/kubescape/node-agent/pull/229/files#diff-7033af70203ec0c7cabfa58ce543eee2aa32baea58ae8904c87db7c406b51f9bR163-R163) ```diff -ap.slugToPods.Get(uniqueSlug).Add(podName) +success := ap.slugToPods.Get(uniqueSlug).Add(podName) +if !success { + logger.L().Error("failed to add pod to slug", helpers.String("podName", podName), helpers.String("uniqueSlug", uniqueSlug)) +} ```
    Avoid appending duplicate channels to the notifiers slice. ___ **When appending to c.notifiers, consider checking if the channel is already present to
    avoid duplicates. This can be done by iterating over c.notifiers and comparing the
    channels.** [pkg/rulebindingmanager/cache/cache.go [104]](https://github.com/kubescape/node-agent/pull/229/files#diff-0674d450411ce55370a6341da8d3a34cadffe21ba15112d3f29955de58e51156R104-R104) ```diff -c.notifiers = append(c.notifiers, n) +alreadyExists := false +for _, notifier := range c.notifiers { + if notifier == n { + alreadyExists = true + break + } +} +if !alreadyExists { + c.notifiers = append(c.notifiers, n) +} ```
    Refactor deletion logic to improve efficiency. ___ **Refactor the deletion logic to avoid redundant calls to Delete and Remove methods by
    checking the cardinality before attempting to delete.** [pkg/objectcache/networkneighborscache/networkneighborscache.go [179-185]](https://github.com/kubescape/node-agent/pull/229/files#diff-ec7eda3c7e1d7fe827159ee9e8067c2ce6278ecaf78d39aef6bc5b6210689c28R179-R185) ```diff -np.slugToPods.Get(uniqueSlug).Remove(podName) -if np.slugToPods.Get(uniqueSlug).Cardinality() == 0 { - np.slugToPods.Delete(uniqueSlug) - np.slugToNetworkNeighbor.Delete(uniqueSlug) - np.allNeighbors.Remove(uniqueSlug) +if np.slugToPods.Has(uniqueSlug) { + np.slugToPods.Get(uniqueSlug).Remove(podName) + if np.slugToPods.Get(uniqueSlug).Cardinality() == 0 { + np.slugToPods.Delete(uniqueSlug) + np.slugToNetworkNeighbor.Delete(uniqueSlug) + np.allNeighbors.Remove(uniqueSlug) + } } ```
    Modify Start method to return an error for better error handling. ___ **Ensure that the Start method returns an error if it fails to start the watchers, to allow
    error handling by the caller.** [pkg/watcher/dynamicwatcher/watch.go [64]](https://github.com/kubescape/node-agent/pull/229/files#diff-f29578a7352bf2f04d5b87fd277128222fa7ce282adc7d71a9d628c9880a5722R64-R64) ```diff -func (wh *WatchHandler) Start(ctx context.Context) { +func (wh *WatchHandler) Start(ctx context.Context) error { ```
    Improve error messages for better debugging. ___ **Consider using a more descriptive error handling for RuleBindingNotifierImplWithK8s when
    fetching pods fails, including the namespace and pod name in the error message.** [pkg/rulebindingmanager/notifier.go [33]](https://github.com/kubescape/node-agent/pull/229/files#diff-c8fbb60780f2adc0d246684085c73ecfcfe548f4010a5c5bcb31eafa61edf115R33-R33) ```diff -return RuleBindingNotify{}, err +return RuleBindingNotify{}, fmt.Errorf("failed to get pod %s/%s: %w", namespace, name, err) ```
    Best practice
    Use mutex to ensure thread safety when modifying container sets. ___ **To ensure thread safety and consistency, consider using atomic operations or a mutex when
    adding and removing items from removedContainers and trackedContainers sets.** [pkg/applicationprofilemanager/v1/applicationprofile_manager.go [132]](https://github.com/kubescape/node-agent/pull/229/files#diff-fc815317651e17975c117749e7661127dbcde82fd9d4d36ebc76cb5b09b3c54eR132-R132) ```diff +am.containerMutexes.Lock(watchedContainer.K8sContainerID) am.removedContainers.Add(watchedContainer.K8sContainerID) +am.containerMutexes.Unlock(watchedContainer.K8sContainerID) ```
    Use a getter function to encapsulate the notifier channel. ___ **Instead of directly passing the address of ruleBindingNotify channel to
    ruleBindingCache.AddNotifier, consider using a getter function to encapsulate the channel.
    This improves encapsulation and allows for easier modifications in the future.** [main.go [168]](https://github.com/kubescape/node-agent/pull/229/files#diff-2873f79a86c0d8b3335cd7731b0ecf7dd4301eb19a82ef7a1cba7589b5252261R168-R168) ```diff -ruleBindingCache.AddNotifier(&ruleBindingNotify) +ruleBindingCache.AddNotifier(ruleBindingCache.GetNotifierChannel()) ```
    Use a select statement to avoid blocking on full notifier channels. ___ **When sending notifications through the notifiers channels, consider using a select
    statement with a default case to avoid potential blocking if the channel is full.** [pkg/rulebindingmanager/cache/cache.go [226]](https://github.com/kubescape/node-agent/pull/229/files#diff-0674d450411ce55370a6341da8d3a34cadffe21ba15112d3f29955de58e51156R226-R226) ```diff -*c.notifiers[i] <- n +select { +case *c.notifiers[i] <- n: +default: + logger.L().Error("notifier channel is full", helpers.String("namespace", pod.GetNamespace()), helpers.String("name", pod.GetName())) +} ```
    Performance
    Use a hash set for efficient container removal checks. ___ **To improve the efficiency of checking if a container has been removed, consider using a
    more efficient data structure like a hash set for rm.removedContainers. This will optimize
    the Contains check.** [pkg/relevancymanager/v1/relevancy_manager.go [331-333]](https://github.com/kubescape/node-agent/pull/229/files#diff-f665b80e0e8d6552b56e677f5579b3b90cb7cd78999a4bec61d41522469393a3R331-R333) ```diff -if rm.removedContainers.Contains(containerID) { +// Assuming `removedContainers` is now a hash set +if _, exists := rm.removedContainers[containerID]; exists { return } ```
    Possible issue
    Initialize preRunningContainersIDs to avoid nil pointer dereferences. ___ **Consider initializing preRunningContainersIDs in CreateIGContainerWatcher to ensure it's
    not nil and avoid potential nil pointer dereferences when using it.** [pkg/containerwatcher/v1/container_watcher.go [119]](https://github.com/kubescape/node-agent/pull/229/files#diff-9fa1df18c7f441eb315fcce4daa355a4517ee2d692d1b6a572f93ee5edbd247dR119-R119) ```diff preRunningContainers mapset.Set[string] +if preRunningContainers == nil { + preRunningContainers = mapset.NewSet[string]() +} ```
    Maintainability
    Remove unnecessary blank lines for consistency and readability. ___ **Remove the unnecessary blank lines to maintain code consistency and improve readability.** [pkg/containerwatcher/v1/container_watcher.go [109-111]](https://github.com/kubescape/node-agent/pull/229/files#diff-9fa1df18c7f441eb315fcce4daa355a4517ee2d692d1b6a572f93ee5edbd247dR109-R111) ```diff ruleManagedContainers mapset.Set[string] // list of containers to track based on rules - metrics metricsmanager.MetricsManager ```

    ✨ Improve tool usage guide:
    **Overview:** The `improve` tool scans the PR code changes, and automatically generates suggestions for improving the PR code. The tool can be triggered [automatically](https://pr-agent-docs.codium.ai/usage-guide/automations_and_usage/#github-app-automatic-tools-when-a-new-pr-is-opened) every time a new PR is opened, or can be invoked manually by commenting on a PR. - When commenting, to edit [configurations](https://github.com/Codium-ai/pr-agent/blob/main/pr_agent/settings/configuration.toml#L78) related to the improve tool (`pr_code_suggestions` section), use the following template: ``` /improve --pr_code_suggestions.some_config1=... --pr_code_suggestions.some_config2=... ``` - With a [configuration file](https://pr-agent-docs.codium.ai/usage-guide/configuration_options/), use the following template: ``` [pr_code_suggestions] some_config1=... some_config2=... ``` See the improve [usage page](https://pr-agent-docs.codium.ai/tools/improve/) for a comprehensive guide on using this tool.
    github-actions[bot] commented 8 months ago

    Summary: