solo-io / gloo

The Cloud-Native API Gateway and AI Gateway
https://docs.solo.io/
Apache License 2.0
4.09k stars 444 forks source link

Downstream/Listener Envoy Metrics per VirtualHost/VirtualService #5233

Open lgadban opened 3 years ago

lgadban commented 3 years ago

Is your feature request related to a problem? Please describe. Responses generated within the proxy (i.e. local replies) are not counted towards cluster/VirtualCluster stats, which means aggregate stats (e.g. response code 4xx) for these responses are only available on the listener/downstream stats, which are listener scoped and do not have an option for more granularity, for example per VirtualHost. This means there is no easy way to monitor the stats of response codes for a specific VirtualHost.

Describe the solution you'd like Aggregate downstream response code metrics/stats per VirtualHost/VirtualService

Describe alternatives you've considered These metrics can be inferred and exported via gRPC access logging; due to the nature of envoy not having per VirtualHost metrics internally this is probably the correct implementation.

Additional context

albsga commented 2 years ago

I agree with the request, in addition to that, it will be interesting to have a metric alike istio_requests_total. https://istio.io/latest/docs/reference/config/metrics/#labels

On top of the default labels that this metric is reporting I will suggest also adding request.host, request.path , request.method. https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/advanced/attributes#request-attributes

krisztianfekete commented 2 years ago

It's not accidental that Envoy doesn't have these, and Istio only has its somewhat controlled list of services (_workloads) as labels.

Using request.host as a label is dangerous as most of security scans/attacks are leveraging the Host header and use a dictionary to probe various values for this header. This can result in overloading a Prometheus in minutes, depending on the machine the attack is running from.

Adding request.path is very similar as attacks can iterate over them with high RPS, and the cardinality of paths is quite unbounded.

Having request.method as a label is fine most of the time as the possible values for that label are quite limited in number.

lgadban commented 2 years ago

Yes utilizing request attributes is likely not a great idea.

I will reiterate the intention of the original request is to provide "per-API" metrics when several APIs are hosted on the same envoy Listener. Managing cardinality is obviously important, but having a built-in way of observing traffic per APIs (or even per virtual host) is table-stakes for API gateways IMO

github-actions[bot] commented 5 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.