microsoft / MS-Lumos

Tools to compare metrics between datasets, accounting for population differences and invariant features.
MIT License
119 stars 17 forks source link

Feature Request: Auto Clustering #4

Open caltonji opened 3 years ago

caltonji commented 3 years ago

I am using the feature ranking within MCT for automated Root Cause Analysis of incidents for our Rest API (e.g. increase in InternalServerError responses from our service). Our dataset is our IncomingRequests (uri, datacenter, responsecode, latency, requestId, etc.), merged with OutgoingRequests (target, responsecode, latency, requestId, etc.). If, for example, we are returning InternalServerError because we received 429 in our first call to DocDB, then our metric column, ResponseCode will equal InternalServerError and a feature column DocDB_GetThead_ResponseCode will equal 429 and we expect our automated Root Cause Analysis tool to tell us that the reason for InternalServerError increase is DocDB_GetThead_ResponseCode == 429.

Ours is a situation of multicollinearity. If the first call to DocDB fails, then all subsequent calls will not happen so for all of the failures, another column, say DocDB_UpdateThead_ResponseCode will be empty. We would like auto clustering, so that instead of producing some 200 "Features Explaining Metric Difference" with the actual root cause buried beneath the noise, we instead produce a handful of combinations of features that are correlated to our metric.

Thank you for your awesome work with this tool!

ashaazami commented 3 years ago

Thanks for your feedback, and thanks for using our tool. We are working on a multivariate version that does consider combinations of metrics to better find out the root cause of the issue. Will add it as soon as we have the extension ready.

Thanks, Ashkan