Open Jooho opened 10 months ago
By setting istio-prometheus-ignore="true" you can avoid scraping on port 15020 happening on the modelmesh pod. See:
Name: istio-proxies-monitor
Namespace: kserve-demo
...
Spec:
Namespace Selector:
Pod Metrics Endpoints:
Bearer Token Secret:
Key:
Interval: 30s
Path: /stats/prometheus
Selector:
Match Expressions:
Key: istio-prometheus-ignore
Operator: DoesNotExist
Analysis
As a part to setup OMW(Openshift Monitoring Workflow), a Pod monitor(istio-proxies-monitor
) has been created which allow to directy scrap metrics from all Pod in KServe runtime namespace on /stats/prometheus
endpoint at HTTP port.
ModelMesh already have a ServiceMonitor
resource on its pod which allows the metric scraping through secure port. istio-proxies-monitor
should not monitor ModelMesh pod.
Solution
istio-proxies-monitor
(PodMonitor) and istiod-monitor
(ServiceMonitor) are suppose to monitor Istio component not the Kserve. As I verifies equivalent Istio PodMonitor
and ServiceMonitor
are already created in Istio-system namespace. So I think we could safely remove the istio-proxies-monitor
(PodMonitor) and istiod-monitor
(ServiceMonitor) from Kserve namespace.
So I think we could safely remove the istio-proxies-monitor(PodMonitor) and istiod-monitor(ServiceMonitor) from Kserve namespace.
Do you know who created these two objects?
So I think we could safely remove the istio-proxies-monitor(PodMonitor) and istiod-monitor(ServiceMonitor) from Kserve namespace.
Do you know who created these two objects?
Today I do some more research around it and found below article. According to point 7.1 its intentionally added in there. We should not remove the istio-proxies-monitor(PodMonitor) and istiod-monitor(ServiceMonitor) from Kserve namespace. https://docs.openshift.com/container-platform/4.14/service_mesh/v2x/ossm-observability.html#ossm-integrating-with-user-workload-monitoring_observability
After discussing with @skonto @bartoszmajsak , we come to the point that we need to add extra label in istio-proxies-monitor
PodMonitor to skip ModelMesh pod monitoring.
Discussion thread : https://redhat-internal.slack.com/archives/C065ARTVA80/p1702293019814919?thread_ts=1701693652.733169&cid=C065ARTVA80
When kserve and modelmeh are running in the same namespace, modelmesh container show these errors:
There are 3 networkpolicy in the namespace:
If allow-from-openshift-monitoring-ns network policy is deleted, the error message is not showing up anymore. So I think this networkpolicy is the culprit of this issue. However, it is not 100% so it needs more debugging.
Reference: https://github.com/orgs/opendatahub-io/projects/42?pane=issue&itemId=40292089