Closed amirmalka closed 7 months ago
PR Description updated to latest commit (https://github.com/kubescape/node-agent/commit/645cbf135799a682ab03beb7472f0955a052756d)
⏱️ Estimated effort to review [1-5] | 4, due to the extensive changes across multiple files including new features, tests, and configurations. The PR introduces a comprehensive component test framework, updates to the chart configurations, and additional resources for testing. Reviewing this PR requires a thorough understanding of the new test framework, the impact of the changes on existing functionalities, and ensuring that the new configurations and resources are correctly implemented. |
🧪 Relevant tests | Yes |
🔍 Possible issues | Possible Bug: The script `tests/scripts/setup-component-test-cluster.sh` assumes the presence of certain tools (kubectl, helm) and environments (kind, minikube) without checking or handling the absence of these prerequisites. This could lead to failures in environments where these tools are not installed or configured. |
Performance Concern: The resource limits and requests specified in `tests/chart/values.yaml` for various components (e.g., nodeAgent, storage) might not be optimal for all testing environments. It's important to validate these values to prevent potential resource exhaustion or performance bottlenecks. | |
🔒 Security concerns | No |
Category | Suggestions |
Enhancement |
Improve error handling by raising an exception instead of just printing an error message.___ **It's recommended to handle the case whereget_active_alerts returns None more gracefully in the get_alerts method. Instead of just printing an error message and returning 1 , consider raising a custom exception that can be caught and handled by the caller. This approach provides more control over error handling and makes the code more robust.** [tests/component-tests/run.py [69-72]](https://github.com/kubescape/node-agent/pull/245/files#diff-20b60ceaa69a2e0ff7634345b4c64f62017b6d45c4cb1737c7bda2ae34a2fe8bR69-R72) ```diff alerts = get_active_alerts(alert_manager_url) -if not alerts: - print("Could not get alerts") - return 1 +if alerts is None: + raise ValueError("Failed to retrieve active alerts") ``` |
Replace fixed sleep time with a dynamic wait mechanism for better test efficiency and reliability.___ **Instead of using a fixed sleep time to wait for garbage collection or other asynchronousoperations, consider implementing a more dynamic wait mechanism. For example, you can poll a specific condition at regular intervals until it's met or a timeout is reached. This approach can make tests run faster in cases where the condition is met sooner than the fixed wait time and more reliable in cases where more time is needed.** [tests/component-tests/load_10k_alerts_no_memory.py [46-47]](https://github.com/kubescape/node-agent/pull/245/files#diff-785423cbbbd3c23a592a9c54c34cfbd78a7c5f8df23b7c87263453b75c2e0f6cR46-R47) ```diff -print("Waiting 300 seconds to have a baseline memory usage") -time.sleep(300) +timeout = 300 +start_time = time.time() +while time.time() - start_time < timeout: + # Check condition here (e.g., memory usage stabilization) + if condition_met: + break + time.sleep(10) # Check every 10 seconds ``` | |
Use
___
**Instead of using | |
Best practice |
Close matplotlib figures after saving to prevent memory leaks.___ **When plotting with matplotlib, it's a good practice to close the figure after saving it tofree up resources. This is especially important in scripts that generate many figures, as it can prevent memory leaks and ensure that the backend resources are properly managed.** [tests/component-tests/promtopic.py [73-74]](https://github.com/kubescape/node-agent/pull/245/files#diff-b3f961c69e8cf2dbd847f70baf64e4bdb4cd0345e2e9cc5b12e5b4e53a31c606R73-R74) ```diff plt.savefig('%s.png'%filename) plt.clf() +plt.close() ``` |
Add a timeout to subprocess calls to avoid indefinite hanging.___ **The subprocess calls in the test functions use bothcheck_call and check_output methods without specifying a timeout. It's a good practice to specify a timeout for these calls to avoid hanging indefinitely if the command does not complete. This makes the tests more robust and prevents them from stalling due to unforeseen issues.** [tests/component-tests/kill_in_the_middle.py [18]](https://github.com/kubescape/node-agent/pull/245/files#diff-93d8650b6caf90839ed06aee168bbe3cae8a26f2ffce0aaaa1d0d2d475213c50R18-R18) ```diff -subprocess.check_call(["kubectl", "-n", namespace , "apply", "-f", "dev/nginx/nginx-deployment.yaml"]) +subprocess.check_call(["kubectl", "-n", namespace , "apply", "-f", "dev/nginx/nginx-deployment.yaml"], timeout=300) ``` | |
Use a context manager for subprocess calls to prevent resource leaks.___ **Consider using a context manager (Python'swith statement) for subprocess calls where capture_output=True is used. This ensures that subprocesses are properly cleaned up after execution, which can help prevent resource leaks.** [tests/component-tests/creation_app_profile_memory_leak.py [37-39]](https://github.com/kubescape/node-agent/pull/245/files#diff-ec3fec66c497ceba2fbf47e9d28011e163a043b2e23cbf28121bff66a0e32111R37-R39) ```diff -get_proc = subprocess.run(["kubectl", "-n", os.environ.get("STORE_NAMESPACE"), "get", "applicationprofiles", f"pod-{nginx_pod_name}-test-namespace", "-oyaml"], capture_output=True) +with subprocess.Popen(["kubectl", "-n", os.environ.get("STORE_NAMESPACE"), "get", "applicationprofiles", f"pod-{nginx_pod_name}-test-namespace", "-oyaml"], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc: + stdout, stderr = proc.communicate() + get_proc = subprocess.CompletedProcess(args=proc.args, returncode=proc.returncode, stdout=stdout, stderr=stderr) ``` | |
Replace hard-coded IP addresses with DNS names.___ **It is recommended to avoid using hard-coded IP addresses in Kubernetes resourcedefinitions. Instead, use DNS names that can be resolved within the cluster or externally, depending on the requirement. This makes the configuration more flexible and less prone to errors if the IP addresses change.** [tests/component-tests/resources/malicious-job-app-profile-namespaced.yaml [34-45]](https://github.com/kubescape/node-agent/pull/245/files#diff-6e13638f8573ff8ee5b6c254910c1a2a20ef8d53c006ccbdddc64d8067ba5213R34-R45) ```diff -- dstEndpoint: 10.96.0.10 -- dstEndpoint: 34.107.204.206 -- dstEndpoint: 10.96.0.1 -- dstEndpoint: 142.251.37.68 +- dstEndpoint: | |
Define resource limits and requests for containers.___ **Consider specifying resource limits and requests for containers to ensure that theapplication does not consume more resources than expected. This practice helps in avoiding resource contention and ensures better stability and performance of the application.** [tests/component-tests/resources/malicious-job-app-profile-namespaced.yaml [10-11]](https://github.com/kubescape/node-agent/pull/245/files#diff-6e13638f8573ff8ee5b6c254910c1a2a20ef8d53c006ccbdddc64d8067ba5213R10-R11) ```diff containers: -- dns: +- resources: + limits: + cpu: "1" + memory: "500Mi" + requests: + cpu: "0.5" + memory: "200Mi" + dns: ``` | |
Replace hardcoded IP addresses with DNS names.___ **It's recommended to avoid using hardcoded IP addresses in Kubernetes configurations.Instead, consider using DNS names that can be resolved within the cluster or externally, depending on the use case. This approach enhances maintainability and flexibility of the configuration.** [tests/component-tests/resources/malicious-job-app-profile.yaml [32-43]](https://github.com/kubescape/node-agent/pull/245/files#diff-61b7b40772aa82e0edee0726dfd2ca9ae923477492009ffb6e6738497a20a105R32-R43) ```diff -- dstEndpoint: 10.96.0.10 -- dstEndpoint: 34.107.204.206 -- dstEndpoint: 10.96.0.1 -- dstEndpoint: 142.251.37.68 +- dstEndpoint: | |
Use
___
**Avoid using | |
Maintainability |
Avoid modifying global variables based on command-line arguments to prevent unexpected behavior.___ **The global variablesalert_manager_url and prometheus_url are modified based on command-line arguments, which can lead to unexpected behavior when running tests multiple times in the same session. Consider passing these URLs as parameters to functions that need them or encapsulating them within a class to avoid potential issues with global state.** [tests/component-tests/run.py [85-90]](https://github.com/kubescape/node-agent/pull/245/files#diff-20b60ceaa69a2e0ff7634345b4c64f62017b6d45c4cb1737c7bda2ae34a2fe8bR85-R90) ```diff -global alert_manager_url -global prometheus_url -if len(sys.argv) > 1: - alert_manager_url = sys.argv[1] -if len(sys.argv) > 2: - prometheus_url = sys.argv[2] +def main(alert_manager_url="http://localhost:9093/", prometheus_url="http://localhost:9090/"): + if len(sys.argv) > 1: + alert_manager_url = sys.argv[1] + if len(sys.argv) > 2: + prometheus_url = sys.argv[2] + # Pass these URLs as parameters to functions that need them ``` |
Refactor repeated subprocess calls into a helper function for better readability.___ **To improve the readability and maintainability of the test cases, consider refactoring therepeated subprocess calls into a helper function. This function could take the file path as an argument and return the subprocess.run result.**
[tests/component-tests/rule_binding_apply_test.py [7-8]](https://github.com/kubescape/node-agent/pull/245/files#diff-42c5cb7eabc8e70d993b289c0e75ca19a1f611717bbda837c826d0bdc1c738fbR7-R8)
```diff
-proc_stat = subprocess.run(["kubectl", "apply", "-f", "tests/component-tests/rule_binding_crds_files/all-valid.yaml"], check=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+def run_kubectl_apply(file_path):
+ return subprocess.run(["kubectl", "apply", "-f", file_path], check=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+proc_stat = run_kubectl_apply("tests/component-tests/rule_binding_crds_files/all-valid.yaml")
+
```
| |
Simplify DNS management by grouping related DNS names.___ **For better maintainability and readability, consider grouping related DNS names under asingle domain to minimize the number of entries and simplify DNS management.** [tests/component-tests/resources/malicious-job-app-profile-namespaced.yaml [12-16]](https://github.com/kubescape/node-agent/pull/245/files#diff-6e13638f8573ff8ee5b6c254910c1a2a20ef8d53c006ccbdddc64d8067ba5213R12-R16) ```diff -- dnsName: dl.k8s.io.default.svc.cluster.local. -- dnsName: dl.k8s.io.svc.cluster.local. -- dnsName: dl.k8s.io.cluster.local. -- dnsName: dl.k8s.io.wework.com. -- dnsName: dl.k8s.io. +- dnsName: "*.dl.k8s.io." ``` | |
Reliability |
Add a retry mechanism with exponential backoff for checking expected alerts.___ **For better test reliability, consider adding a retry mechanism with exponential backoffwhen checking for expected alerts. This can help account for delays in alert generation and propagation.** [tests/component-tests/all_alerts_from_malicious_app.py [54-57]](https://github.com/kubescape/node-agent/pull/245/files#diff-21b7d56a0387f8825d5bbd2da9cce51e248e4c4f2bd143c75918d7c2b16d0cc6R54-R57) ```diff -for alert in alerts: - rule_name = alert['labels']['rule_name'] - if rule_name in expected_alerts: - expected_alerts.remove(rule_name) +import time +retries = 3 +delay = 2 # seconds +while retries > 0 and expected_alerts: + for alert in alerts: + rule_name = alert['labels']['rule_name'] + if rule_name in expected_alerts: + expected_alerts.remove(rule_name) + if expected_alerts: + time.sleep(delay) + delay *= 2 + retries -= 1 + alerts = test_framework.get_alerts(namespace=ns) # Refresh alerts ``` |
Add error handling for subprocess and requests calls in
___
**To ensure that the | |
Security |
Review and restrict file operation permissions to enhance security.___ **Ensure that the use of broad file permissions, especially withO_RDWR and O_CREAT , is intentional and secure. Overly permissive file operations can lead to security vulnerabilities, such as unauthorized file access or modification.** [tests/component-tests/resources/malicious-job-app-profile-namespaced.yaml [47-55]](https://github.com/kubescape/node-agent/pull/245/files#diff-6e13638f8573ff8ee5b6c254910c1a2a20ef8d53c006ccbdddc64d8067ba5213R47-R55) ```diff - flags: - - O_RDWR - - O_CREAT - - O_TRUNC + - O_RDONLY - O_CLOEXEC path: /kubectl ``` |
Review and minimize the set of allowed syscalls for security.___ **To enhance the security posture, consider reviewing and minimizing the set of allowedsyscalls. Limiting syscalls can help in reducing the attack surface of the container and protect against certain types of exploits.** [tests/component-tests/resources/malicious-job-app-profile-namespaced.yaml [377-379]](https://github.com/kubescape/node-agent/pull/245/files#diff-6e13638f8573ff8ee5b6c254910c1a2a20ef8d53c006ccbdddc64d8067ba5213R377-R379) ```diff - access -- arch_prctl -- bind +# Review and potentially remove unnecessary syscalls ``` | |
Review and secure
___
**Ensure that the use of | |
Ensure restrictive file permissions for sensitive operations.___ **For file operations, consider specifying more restrictive permissions when creating files,especially for sensitive locations such as /root/.kube/cache . Using O_CREAT with O_EXCL is good, but ensure the permissions are set to avoid unauthorized access.** [tests/component-tests/resources/malicious-job-app-profile.yaml [86-87]](https://github.com/kubescape/node-agent/pull/245/files#diff-61b7b40772aa82e0edee0726dfd2ca9ae923477492009ffb6e6738497a20a105R86-R87) ```diff - flags: - O_RDWR - O_CREAT - O_EXCL - O_CLOEXEC path: /root/.kube/cache/http/.diskv-temp/1551865526 +# Consider setting more restrictive permissions ``` | |
Review and secure network activity configurations.___ **Ensure that network activities, especially those involving potentially sensitive endpointsor non-standard ports, are necessary and secure. Review the network activity configuration to confirm that it aligns with the intended behavior and security policies of the application.** [tests/component-tests/resources/malicious-job-app-profile.yaml [30-43]](https://github.com/kubescape/node-agent/pull/245/files#diff-61b7b40772aa82e0edee0726dfd2ca9ae923477492009ffb6e6738497a20a105R30-R43) ```diff networkActivity: outgoing: - - dstEndpoint: 10.96.0.10 + - dstEndpoint: |
@dwertent The PR is not ready for review...
Summary:
Summary:
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
Summary:
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
:sparkles: Artifacts are available here.
Type
enhancement, tests
Description
Changes walkthrough
component_test.go
Add Comprehensive Component Tests for Node-Agent
tests/component_test.go
malicious app, basic load activities, memory leak tests, and process
kill in the middle of execution.
workloads, and handling Prometheus metrics.
malicious.go
Implement Malicious App Simulation for Testing
tests/images/malicious-app/malicious.go
file access, system calls, and network behaviors.
k8s.go
Add Kubernetes Utilities for Component Tests
tests/testutils/k8s.go
and handling application profiles.
prometheus.go
Add Prometheus Interaction Utilities for Tests
tests/testutils/prometheus.go
locustfile.py
Add Locust File for Load Testing
tests/images/locustimage/locustfile.py
URL.
r1008_crypto_mining_domain.go
Update Crypto Mining Domains List
pkg/ruleengine/v1/r1008_crypto_mining_domain.go
ending with a period.
malware_manager.go
Clean up Debug Logs in Malware Manager
pkg/malwaremanager/v1/malware_manager.go
manager.
configmap.yaml
Add Node-Agent ConfigMap Template
tests/chart/templates/node-agent/configmap.yaml
based on values from Helm chart.