Closed dwertent closed 8 months ago
PR Description updated to latest commit (https://github.com/kubescape/node-agent/commit/884c611a4648b03fd6b79afb8f38a137a816a4fd)
⏱️ Estimated effort to review [1-5] | 4, because the PR introduces significant changes across multiple components including cache management, container lifecycle management, and rule binding notifications. The complexity of these changes, their interactions, and the potential for subtle bugs or performance issues require a thorough review. |
🧪 Relevant tests | No |
🔍 Possible issues | Possible Bug: The `deleteResources` method in `ApplicationProfileManager` and `NetworkManager` now adds the container ID to a `removedContainers` set but does not check if the container is in this set before processing it in other methods. This could lead to scenarios where a deleted container is still being processed. |
Performance Concern: The addition of channels for rule binding notifications (`ruleBindingNotify`) in multiple components could lead to increased memory usage and potential goroutine leaks if not properly managed and cleaned up. | |
Logical Issue: In `container_watcher.go`, the logic to determine whether a container is new or pre-existing might not correctly handle all cases, especially in dynamic environments where containers can be quickly created and destroyed. | |
Consistency Issue: The handling of errors and logging within the new code is inconsistent. For example, some errors are logged at an error level, while others are logged at an info level without a clear rationale for the difference. | |
🔒 Security concerns | No |
Category | Suggestions |
Enhancement |
Improve error handling when retrieving running containers.___ **Consider using a more specific error handling instead of logging and continuing executionwhen k8sClient.GetRunningContainers(pod) fails. This could lead to unexpected behavior if there are issues retrieving running containers.** [pkg/containerwatcher/v1/container_watcher_private.go [113]](https://github.com/kubescape/node-agent/pull/229/files#diff-6f95b4caa6090a17da5aed1923600fd049392d228e0fba99cc212f48111f3ffeR113-R113) ```diff -containers := k8sClient.GetRunningContainers(pod) +containers, err := k8sClient.GetRunningContainers(pod) +if err != nil { + logger.L().Error("Failed to get running containers", helpers.Error(err)) + return fmt.Errorf("failed to get running containers: %w", err) +} ``` |
Handle syscall peek function errors more explicitly.___ **To avoid silently ignoring errors, consider handling the error returned bysyscallPeekFunc more explicitly, especially for cases other than "no syscall found".** [pkg/applicationprofilemanager/v1/applicationprofile_manager.go [237]](https://github.com/kubescape/node-agent/pull/229/files#diff-fc815317651e17975c117749e7661127dbcde82fd9d4d36ebc76cb5b09b3c54eR237-R237) ```diff -if err != nil && !strings.Contains(err.Error(), "no syscall found") { +if err != nil { + if !strings.Contains(err.Error(), "no syscall found") { + logger.L().Ctx(ctx).Error("ApplicationProfileManager - failed to get syscalls", helpers.Error(err), + helpers.String("slug", slug), + helpers.Int("container index", watchedContainer.ContainerIndex)) + } + return +} ``` | |
Add a timeout to the waitForContainer method to prevent infinite waiting.___ **Ensure that thewaitForContainer method handles the case where a container is not found within a reasonable timeout to prevent infinite waiting.** [pkg/networkmanager/network_manager.go [625-630]](https://github.com/kubescape/node-agent/pull/229/files#diff-cbcd69c2d2fb59b909f8b3cec3ff32348a1ddb4f1d465ece79ead0fa0ba7168bR625-R630) ```diff +backOff := backoff.NewExponentialBackOff() +backOff.MaxElapsedTime = 2 * time.Minute return backoff.Retry(func() error { if am.trackedContainers.Contains(k8sContainerID) { return nil } return fmt.Errorf("container %s not found", k8sContainerID) -}, backoff.NewExponentialBackOff()) +}, backOff) ``` | |
Add error handling for unregisterContainer call within time.AfterFunc.___ **Consider checking for errors when callingch.unregisterContainer(notif.Container) inside the time.AfterFunc to ensure proper error handling and logging.**
[pkg/containerwatcher/v1/container_watcher_private.go [40-42]](https://github.com/kubescape/node-agent/pull/229/files#diff-6f95b4caa6090a17da5aed1923600fd049392d228e0fba99cc212f48111f3ffeR40-R42)
```diff
time.AfterFunc(ch.cfg.MaxSniffingTime, func() {
ch.timeBasedContainers.Remove(notif.Container.Runtime.ContainerID)
- ch.unregisterContainer(notif.Container)
+ if err := ch.unregisterContainer(notif.Container); err != nil {
+ logger.L().Error("Failed to unregister container", helpers.Error(err))
+ }
})
```
| |
Check the result of adding a pod name to the set for success or failure.___ **Consider checking the result ofap.slugToPods.Get(uniqueSlug).Add(podName) for success or failure. This will ensure that the pod name was successfully added to the set.** [pkg/objectcache/applicationprofilecache/applicationprofilecache.go [163]](https://github.com/kubescape/node-agent/pull/229/files#diff-7033af70203ec0c7cabfa58ce543eee2aa32baea58ae8904c87db7c406b51f9bR163-R163) ```diff -ap.slugToPods.Get(uniqueSlug).Add(podName) +success := ap.slugToPods.Get(uniqueSlug).Add(podName) +if !success { + logger.L().Error("failed to add pod to slug", helpers.String("podName", podName), helpers.String("uniqueSlug", uniqueSlug)) +} ``` | |
Avoid appending duplicate channels to the notifiers slice.___ **When appending toc.notifiers , consider checking if the channel is already present to avoid duplicates. This can be done by iterating over c.notifiers and comparing the channels.** [pkg/rulebindingmanager/cache/cache.go [104]](https://github.com/kubescape/node-agent/pull/229/files#diff-0674d450411ce55370a6341da8d3a34cadffe21ba15112d3f29955de58e51156R104-R104) ```diff -c.notifiers = append(c.notifiers, n) +alreadyExists := false +for _, notifier := range c.notifiers { + if notifier == n { + alreadyExists = true + break + } +} +if !alreadyExists { + c.notifiers = append(c.notifiers, n) +} ``` | |
Refactor deletion logic to improve efficiency.___ **Refactor the deletion logic to avoid redundant calls toDelete and Remove methods by checking the cardinality before attempting to delete.** [pkg/objectcache/networkneighborscache/networkneighborscache.go [179-185]](https://github.com/kubescape/node-agent/pull/229/files#diff-ec7eda3c7e1d7fe827159ee9e8067c2ce6278ecaf78d39aef6bc5b6210689c28R179-R185) ```diff -np.slugToPods.Get(uniqueSlug).Remove(podName) -if np.slugToPods.Get(uniqueSlug).Cardinality() == 0 { - np.slugToPods.Delete(uniqueSlug) - np.slugToNetworkNeighbor.Delete(uniqueSlug) - np.allNeighbors.Remove(uniqueSlug) +if np.slugToPods.Has(uniqueSlug) { + np.slugToPods.Get(uniqueSlug).Remove(podName) + if np.slugToPods.Get(uniqueSlug).Cardinality() == 0 { + np.slugToPods.Delete(uniqueSlug) + np.slugToNetworkNeighbor.Delete(uniqueSlug) + np.allNeighbors.Remove(uniqueSlug) + } } ``` | |
Modify
___
**Ensure that the | |
Improve error messages for better debugging.___ **Consider using a more descriptive error handling forRuleBindingNotifierImplWithK8s when fetching pods fails, including the namespace and pod name in the error message.** [pkg/rulebindingmanager/notifier.go [33]](https://github.com/kubescape/node-agent/pull/229/files#diff-c8fbb60780f2adc0d246684085c73ecfcfe548f4010a5c5bcb31eafa61edf115R33-R33) ```diff -return RuleBindingNotify{}, err +return RuleBindingNotify{}, fmt.Errorf("failed to get pod %s/%s: %w", namespace, name, err) ``` | |
Best practice |
Use mutex to ensure thread safety when modifying container sets.___ **To ensure thread safety and consistency, consider using atomic operations or a mutex whenadding and removing items from removedContainers and trackedContainers sets.**
[pkg/applicationprofilemanager/v1/applicationprofile_manager.go [132]](https://github.com/kubescape/node-agent/pull/229/files#diff-fc815317651e17975c117749e7661127dbcde82fd9d4d36ebc76cb5b09b3c54eR132-R132)
```diff
+am.containerMutexes.Lock(watchedContainer.K8sContainerID)
am.removedContainers.Add(watchedContainer.K8sContainerID)
+am.containerMutexes.Unlock(watchedContainer.K8sContainerID)
```
|
Use a getter function to encapsulate the notifier channel.___ **Instead of directly passing the address ofruleBindingNotify channel to ruleBindingCache.AddNotifier , consider using a getter function to encapsulate the channel. This improves encapsulation and allows for easier modifications in the future.** [main.go [168]](https://github.com/kubescape/node-agent/pull/229/files#diff-2873f79a86c0d8b3335cd7731b0ecf7dd4301eb19a82ef7a1cba7589b5252261R168-R168) ```diff -ruleBindingCache.AddNotifier(&ruleBindingNotify) +ruleBindingCache.AddNotifier(ruleBindingCache.GetNotifierChannel()) ``` | |
Use a select statement to avoid blocking on full notifier channels.___ **When sending notifications through thenotifiers channels, consider using a select statement with a default case to avoid potential blocking if the channel is full.** [pkg/rulebindingmanager/cache/cache.go [226]](https://github.com/kubescape/node-agent/pull/229/files#diff-0674d450411ce55370a6341da8d3a34cadffe21ba15112d3f29955de58e51156R226-R226) ```diff -*c.notifiers[i] <- n +select { +case *c.notifiers[i] <- n: +default: + logger.L().Error("notifier channel is full", helpers.String("namespace", pod.GetNamespace()), helpers.String("name", pod.GetName())) +} ``` | |
Performance |
Use a hash set for efficient container removal checks.___ **To improve the efficiency of checking if a container has been removed, consider using amore efficient data structure like a hash set for rm.removedContainers . This will optimize the Contains check.**
[pkg/relevancymanager/v1/relevancy_manager.go [331-333]](https://github.com/kubescape/node-agent/pull/229/files#diff-f665b80e0e8d6552b56e677f5579b3b90cb7cd78999a4bec61d41522469393a3R331-R333)
```diff
-if rm.removedContainers.Contains(containerID) {
+// Assuming `removedContainers` is now a hash set
+if _, exists := rm.removedContainers[containerID]; exists {
return
}
```
|
Possible issue |
Initialize
___
**Consider initializing |
Maintainability |
Remove unnecessary blank lines for consistency and readability.___ **Remove the unnecessary blank lines to maintain code consistency and improve readability.** [pkg/containerwatcher/v1/container_watcher.go [109-111]](https://github.com/kubescape/node-agent/pull/229/files#diff-9fa1df18c7f441eb315fcce4daa355a4517ee2d692d1b6a572f93ee5edbd247dR109-R111) ```diff ruleManagedContainers mapset.Set[string] // list of containers to track based on rules - metrics metricsmanager.MetricsManager ``` |
Summary:
User description
Overview
Type
enhancement, bug_fix
Description
Changes walkthrough
16 files
main.go
Enhance Object and Rule Binding Cache Management and Shutdown
Procedure
main.go
objectcachev1
andrulebindingcachev1
for enhanced cachemanagement.
ruleBindingNotify
channel to manage rule binding notifications.ruleBindingNotify
withcontainerwatcher
to observecontainer events based on rule bindings.
dWatcher
stops upon application shutdown.applicationprofile_manager.go
Improve Application Profile Manager with Better Error Handling and
Container Tracking
pkg/applicationprofilemanager/v1/applicationprofile_manager.go
during delete operations.
found" errors.
container_watcher.go
Refactor Container Watcher to Support Rule-based Container Monitoring
pkg/containerwatcher/v1/container_watcher.go
ruleBindingPodNotify
to manage container monitoring based onrule bindings.
timeBasedContainers
andruleManagedContainers
sets forcontainer tracking.
time-based and rule-managed containers.
container_watcher_private.go
Enhance Container Callback and Monitoring Logic for Rule-based
Operations
pkg/containerwatcher/v1/container_watcher_private.go
monitoring.
addRunningContainers
to monitor containers based on rulebindings.
containers.
network_manager.go
Improve Network Manager with Enhanced Container Tracking
pkg/networkmanager/network_manager.go
removedContainers
andtrackedContainers
sets for bettercontainer lifecycle management.
waitForContainer
method to handle removed containers.applicationprofilecache.go
Optimize Application Profile Caching Logic
pkg/objectcache/applicationprofilecache/applicationprofilecache.go
pods.
k8scache.go
Initialize PodSpec Map in Kubernetes Object Cache
pkg/objectcache/k8scache/k8scache.go
podSpec
map to store Kubernetes object specifications.networkneighborscache.go
Optimize Network Neighbors Caching Logic
pkg/objectcache/networkneighborscache/networkneighborscache.go
objectcache_interface.go
Introduce NewObjectCacheMock Function
pkg/objectcache/objectcache_interface.go
NewObjectCacheMock
function to create a mock object cache.relevancy_manager.go
Improve Relevancy Manager with Better Container Lifecycle Management
pkg/relevancymanager/v1/relevancy_manager.go
gracefully.
cache.go
Enhance Rule Binding Cache with Notification Support
pkg/rulebindingmanager/cache/cache.go
helpers.go
Add Helper Function for Name Conversion in Rule Binding Cache
pkg/rulebindingmanager/cache/helpers.go
notifier.go
Introduce Rule Binding Notification Mechanism
pkg/rulebindingmanager/notifier.go
RuleBindingNotify
structure and related functions to managerule binding notifications.
rulebindingmanager_interface.go
Add Notifier Method to RuleBindingCache Interface
pkg/rulebindingmanager/rulebindingmanager_interface.go - Added `AddNotifier` method to the `RuleBindingCache` interface.
rulebindingmanager_mock.go
Introduce NewRuleBindingCacheMock Function and Fix Mock Behavior
pkg/rulebindingmanager/rulebindingmanager_mock.go
NewRuleBindingCacheMock
function.IsCached
method to return true for better mock behavior.rule_manager.go
Adjust Rule Manager Caching Check for Efficiency
pkg/rulemanager/v1/rule_manager.go
K8sObjectCache
directly.1 files
open_test.go
Fix Container Watcher Test Initialization
pkg/containerwatcher/v1/open_test.go
CreateIGContainerWatcher
call in tests by adding the missingruleBindingPodNotify
parameter.1 files
watch.go
Adjust Dynamic Watcher Start Method and Fix Storage Object Handling
pkg/watcher/dynamicwatcher/watch.go
Start
method to not return an error.