Open timmydo opened 1 year ago
here is a quick list I put together by searching for Log.
and removing some that didn't look relevant:
src\ReverseProxy\Configuration\ConfigProvider\ConfigurationConfigProvider.cs:
70: Log.LoadData(_logger);
88: Log.ConfigurationDataConversionFailed(_logger, ex);
110: Log.ErrorSignalingChange(_logger, ex);
src\ReverseProxy\Forwarder\ForwarderHttpClientFactory.cs:
40: Log.ClientReused(_logger, context.ClusterId);
60: Log.ClientCreated(_logger, context.ClusterId);
src\ReverseProxy\Forwarder\ForwarderMiddleware.cs:
50: Log.NoAvailableDestinations(_logger, cluster.ClusterId);
62: Log.MultipleDestinationsAvailable(_logger, cluster.ClusterId);
79: ForwarderTelemetry.Log.ForwarderInvoke(cluster.ClusterId, route.Config.RouteId, destination.DestinationId);
src\ReverseProxy\Forwarder\HttpForwarder.cs:
150: Log.NotProxying(_logger, context.Response.StatusCode);
154: Log.Proxying(_logger, destinationRequest, isStreamingRequest);
171: Log.RetryingWebSocketDowngradeNoConnect(_logger);
179: Log.RetryingWebSocketDowngradeNoHttp2(_logger);
214: Log.ResponseReceived(_logger, destinationResponse);
490: Log.InvalidSecWebSocketKeyHeader(_logger, key);
922: Log.ErrorProxying(_logger, error, ex);
src\ReverseProxy\Health\ActiveHealthCheckMonitor.cs:
61: Log.ExplicitActiveCheckOfAllClustersHealthFailed(_logger, ex);
116: Log.StartingActiveHealthProbingOnCluster(_logger, cluster.ClusterId);
142: Log.ActiveHealthProbingFailedOnCluster(_logger, cluster.ClusterId, ex);
156: Log.ErrorOccuredDuringActiveHealthProbingShutdownOnCluster(_logger, cluster.ClusterId, ex);
158
159: Log.StoppedActiveHealthProbingOnCluster(_logger, cluster.ClusterId);
176: Log.ActiveHealthProbeConstructionFailedOnCluster(_logger, destination.DestinationId, cluster.ClusterId, ex);
177
187: Log.SendingHealthProbeToEndpointOfDestination(_logger, request.RequestUri, destination.DestinationId, cluster.ClusterId);
189: Log.DestinationProbingCompleted(_logger, destination.DestinationId, cluster.ClusterId, (int)response.StatusCode);
197: Log.DestinationProbingFailed(_logger, destination.DestinationId, cluster.ClusterId, ex);
src\ReverseProxy\Health\DestinationHealthUpdater.cs:
43: Log.ActiveDestinationHealthStateIsSetToUnhealthy(_logger, destination.DestinationId, cluster.ClusterId);
47: Log.ActiveDestinationHealthStateIsSet(_logger, destination.DestinationId, cluster.ClusterId, newHealth);
85: Log.UnhealthyDestinationIsScheduledForReactivation(_logger, destination.DestinationId, reactivationPeriod);
100: Log.PassiveDestinationHealthResetToUnkownState(_logger, destination.DestinationId);
src\ReverseProxy\LoadBalancing\LoadBalancingMiddleware.cs:
59: Log.NoAvailableDestinations(_logger, proxyFeature.Cluster.Config.ClusterId);
src\ReverseProxy\Model\ProxyPipelineInitializerMiddleware.cs:
40: Log.NoClusterFound(_logger, route.Config.RouteId);
src\ReverseProxy\WebSocketsTelemetry\WebSocketsTelemetryMiddleware.cs:
56: WebSocketsTelemetry.Log.WebSocketClosed(
80: WebSocketsTelemetry.Log.WebSocketClosed(
@timmydo Thanks for reporting this. We believe there's value in making some of these metrics, but perhaps not all of them (e.g. error conditions that should really be addressed immediately).
If you could provide a list of the ones that are the highest priority to add metrics for, that would be great.
Triage: perhaps another interesting metric might be a catch-all "this request couldn't be proxied" metric.
Counters for these files above look to be higher priority, with higher priority being to logs that would be error or warning:
src\ReverseProxy\Forwarder\ForwarderMiddleware.cs
src\ReverseProxy\Health\DestinationHealthUpdater.cs
src\ReverseProxy\Health\ActiveHealthCheckMonitor.cs
src\ReverseProxy\LoadBalancing\LoadBalancingMiddleware.cs
having counters for successful requests is useful also to validate whether no errors is a data-missing scenario or no errors. Also, you can measure the reliability.
What should we add or change to make your life better?
Requesting metrics in some of the default classes such as: https://github.com/microsoft/reverse-proxy/blob/04b1fe8d8cd0f25b814805c2fb91c68fb26c7f29/src/ReverseProxy/Health/DestinationHealthUpdater.cs#L43
https://github.com/microsoft/reverse-proxy/blob/04b1fe8d8cd0f25b814805c2fb91c68fb26c7f29/src/ReverseProxy/Health/ActiveHealthCheckMonitor.cs#L197C31-L197C31
https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics-collection
Specifically, today, I'm looking for a way to have alerts go off when our health checks fail without reimplementing YARP's default implementations. More generally it would be nice if YARP had metrics to complement the Log statements.
Why is this important to you?
We are able to set up alerts on metrics but not on individual log messages. Metrics can alert the on-call engineers to look at the logs for further information.