Closed a-thaler closed 4 years ago
Storage usage before metric re-labeling, screenshoot below show time series being scraped before re-labeling, currently promethues uses over 100K time series
Storage usage after metric re-labeling, screenshoot below show time series being scraped after re-labeling, currently promethues uses around 50K time series
Amount of time series will be dramatically reduced after reducing label collected and attached scraped metric, current result shows only metric re-labeling.
Application profiling is in progress, this will show us memory and CPU consumption after and before metric re-labeling
Following memory profiling show prometheus memory usage before and after optimization
Before optimization: Prometheus using around 900Mb memory
After optimization: Prometheus using around 240Mb memory
A dedicated kyma cluster with new configuration deployed on https://grafana.mon-test.berlin.shoot.canary.k8s-hana.ondemand.com//?orgId=1 and available for reviewing changes.
Please check all dashboards and alert-rules related to your component and ensure they are still working as expected. In case you miss some metrics please let us know with the component/servicce-monitor name so we can add missing metrics to the new configuration
Description For prometheus the label cardinality is key. As in kyma the default monitoring is just scraping any endpoint blindly and is labelling all metrics by default, the whole setup is not manageable (very exhaustive) from a memory perspective.
To successfully manage the memory footprint, endpoints must be scraped more conscious and queries should be optimized and cutted if bad behaving.
Actions:
Reasons Reduce the memory footprint to make kyma more lightweight, have a more predictable memory consumption over time
Attachments