microsoft / ApplicationInsights-node.js

Microsoft Application Insights SDK for Node.js
MIT License
324 stars 141 forks source link

API Management <> Azure Functions missing links #816

Closed f0o closed 2 years ago

f0o commented 3 years ago

Hi,

I've set up the same App Insights instance for APIM and Azure Function but I cannot track requests end-to-end.

It shows 2 Clusters/Paths: 1) Example-APIM-Name -> Example-AzureFunction-Name 2) "Web" (Static Name) to MongoDB

There's no connection/Link between the AzureFunction Name and the static Web path making correlation extremely hard.

Any clue how to tell the AzureFunction to identify with it's actual name (Example-AzureFunction-Name) instead of "Web" and link itself to the correlation path behind the APIM?

Furthermore when clicking on the Application Map on "Web" and then on the Compute tab it yields:

We didn't detect any Azure resources connected to Web. Please ensure that application is using the latest Application Insights SDK version.

Seems like the default config does not seem to be able to properly figure out that's running inside Azure Function (Windows/Consumption SKU)

f0o commented 3 years ago

Seems to be solved by adding:

const appInsights = require("applicationinsights");
appInsights.setup();
appInsights.defaultClient.setAutoPopulateAzureProperties(true);
appInsights.start();

EDIT: Although it was working briefly (at least the name is no longer "Web", it's the Example-AzureFunction-Name now) it's still not linking up correctly making it very hard, to correlate failures from APIM->Backend to the actual cause as I got 2 clusters of dependencies now and I'm solely relying on timestamps again...

f0o commented 3 years ago
appInsights.setup()
    .setAutoDependencyCorrelation(true)
    .setAutoCollectRequests(true)
    .setAutoCollectPerformance(true, true)
    .setAutoCollectExceptions(true)
    .setAutoCollectDependencies(true)
    .setAutoCollectConsole(true)
    .setUseDiskRetryCaching(true)
    .setSendLiveMetrics(false)
    .setDistributedTracingMode(appInsights.DistributedTracingModes.AI_AND_W3C)
appInsights.defaultClient.setAutoPopulateAzureProperties(true);
appInsights.start();

No Luck with this either.

They do link up if the runtime emits an issue (like a syntax error in the code) which leads me to believe that the code (so either setup or the correlation context) is wrong.

f0o commented 3 years ago

It looks like the ParentID is derived wrong using the wrapper from the README.md:

WebTest (START):
OpID: a35c9c0530ff4c93803001abdd73d146
ParentID: a35c9c0530ff4c93803001abdd73d146
ID: a35c9c0530ff4c93803001abdd73d146

APIM:
OpID: a35c9c0530ff4c93803001abdd73d146
ParentID: |a35c9c0530ff4c93803001abdd73d146.0
ID: |a35c9c0530ff4c93803001abdd73d146.0.24431d27_

APIM-Backend:
OpID: a35c9c0530ff4c93803001abdd73d146
ParentID: |a35c9c0530ff4c93803001abdd73d146.0.24431d27_
ID: |a35c9c0530ff4c93803001abdd73d146.0.24431d27_24431d28_

AzFunc:
OpID: a35c9c0530ff4c93803001abdd73d146
ParentID: 957f857c1d396c45
ID: 957f857c1d396c45

correlationContext.operation.parentId yields |a35c9c0530ff4c93803001abdd73d146.957f857c1d396c45. but when set as the Id in the object it's transformed to: 957f857c1d396c45 and also set as the ParentID... I suspect that's why the correlation isn't linking up

f0o commented 3 years ago

following https://docs.microsoft.com/en-us/azure/azure-monitor/app/correlation#incoming-request-correlation 's curl example to manually set the traceparent header also shows that APIM and APIM-Backend do honor that header but AzureFunction does NOT.

Either the APIM-Backend is not forwarding it or the Context derivation from the wrapper is faulty

//EDIT: Using: -H "traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"

Azure Function receives the traceparent header in the req variable: "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-8c4eaee27c1ec34c-01"

This header shows the Correct ParentId 8c4eaee27c1ec34c from APIM-Backend being passed on to the Azure Function but the Wrapper code in README.md discards it for some reason.

full code:

const appInsights = require("applicationinsights");
appInsights.setup()
    .setAutoDependencyCorrelation(true)
    .setAutoCollectRequests(true)
    .setAutoCollectPerformance(true, true)
    .setAutoCollectExceptions(true)
    .setAutoCollectDependencies(true)
    .setAutoCollectConsole(true)
    .setUseDiskRetryCaching(true)
    .setSendLiveMetrics(false)
    .setDistributedTracingMode(appInsights.DistributedTracingModes.AI_AND_W3C)
appInsights.defaultClient.setAutoPopulateAzureProperties(true);
appInsights.start();
// Default export wrapped with Application Insights FaaS context propagation
module.exports = async function contextPropagatingHttpTrigger(context, req) {
    // Start an AI Correlation Context using the provided Function context
    const correlationContext = appInsights.startOperation(context, req);

    // Wrap the Function runtime with correlationContext
    return appInsights.wrapWithCorrelationContext(async () => {
            const startTime = Date.now(); // Start trackRequest timer

            // Run the Function
            context.res.body = {req: req, parentId: correlationContext.operation.parentId}

            // Track Request on completion
            appInsights.defaultClient.trackRequest({
                    name: context.req.method + " " + context.req.url,
                    resultCode: context.res.status,
                    success: true,
                    url: req.url,
                    duration: Date.now() - startTime,
                    id: correlationContext.operation.parentId,
            });
            appInsights.defaultClient.flush();
    }, correlationContext)();
};
f0o commented 3 years ago

I think I got it boiled down to an issue when request-id is supplied instead of traceparent.

Unfortunately App Insight Web-Tests do not send traceparent and only set request-id.

Furthermore I can only get it to link up correctly when using traceparent and initializing the context with const correlationContext = appInsights.startOperation(req); (note that I only use req here)

Otherwise I get misaligned parents again.

It seems that https://github.com/microsoft/ApplicationInsights-node.js/blob/master/Library/Traceparent.ts#L77 isn't being matched for requests with only request-id, I'll spam the files with exceptions and see if I get something logged

//EDIT:

Still not reliable.

At this point I'm left to believe that WebTest<>APIM is compatible but APIM<>AzureFunc is NOT.

Any pointers?

//EDIT:

Seems like https://github.com/microsoft/ApplicationInsights-node.js/blob/develop/AutoCollection/CorrelationContextManager.ts#L198-L217 is the culprit...

It disregards the traceparent set in the headers by APIM:

Variable traceContext is:

    "traceContext": {
      "traceparent": "00-285908310eefb546b5dfca05bf84c811-bf617606c6efbf4d-01",
      "tracestate": "",
      "attributes": {}
    },

While the header/s is/are:

      "traceparent": "00-285908310eefb546b5dfca05bf84c811-f0a77f1b87bd284a-01",
      "request-id": "|285908310eefb546b5dfca05bf84c811.f0a77f1b87bd284a.",
      "request-context": "appId=cid-v1:5612578a-5658-4795-ae18-4e8a5694fd54",

//EDIT:

Full output if using only const correlationContext = appInsights.startOperation(req); or const correlationContext = appInsights.startOperation(req,req);:

{
  "debug": {
    "traceContext": {
      "traceparent": "00-a9e5e9a038a69e4aaba07e9e039c8bdb-50959e10d8e3f043-01",
      "tracestate": "",
      "attributes": {}
    },
    "spanContext": null
  },
  "cc": {
    "operation": {
      "name": "GET /health",
      "id": "a9e5e9a038a69e4aaba07e9e039c8bdb",
      "parentId": "|a9e5e9a038a69e4aaba07e9e039c8bdb.ad4ca1701fe2d949.",
      "traceparent": {
        "traceFlag": "01",
        "version": "00",
        "traceId": "a9e5e9a038a69e4aaba07e9e039c8bdb",
        "spanId": "ad4ca1701fe2d949",
        "parentId": "|a9e5e9a038a69e4aaba07e9e039c8bdb.ad4ca1701fe2d949."
      },
      "tracestate": {
        "fieldmap": []
      }
    },
    "customProperties": {
      "props": []
    }
  },
  "req": {
    "traceparent": "00-a9e5e9a038a69e4aaba07e9e039c8bdb-ad4ca1701fe2d949-01",
    "request-id": "|a9e5e9a038a69e4aaba07e9e039c8bdb.ad4ca1701fe2d949.",
    "request-context": "appId=cid-v1:5612578a-5658-4795-ae18-4e8a5694fd54"
  },
  "parentId": "|a9e5e9a038a69e4aaba07e9e039c8bdb.ad4ca1701fe2d949."
}

If we use the context from the Azure Function it will break again: (const correlationContext = appInsights.startOperation(context,req);

{
  "debug": {
    "traceContext": {
      "traceparent": "00-2411bc12d95f04459e873f5472af3310-2a48b82766a4854c-01",
      "tracestate": "",
      "attributes": {}
    },
    "spanContext": null
  },
  "corrContext": {
    "operation": {
      "name": "GET /health",
      "id": "2411bc12d95f04459e873f5472af3310",
      "parentId": "|2411bc12d95f04459e873f5472af3310.2a48b82766a4854c.",
      "traceparent": {
        "traceFlag": "01",
        "version": "00",
        "traceId": "2411bc12d95f04459e873f5472af3310",
        "spanId": "2a48b82766a4854c",
        "parentId": "|2411bc12d95f04459e873f5472af3310.2a48b82766a4854c."
      },
      "tracestate": {
        "fieldmap": []
      }
    },
    "customProperties": {
      "props": []
    }
  },
  "headers": {
    "traceparent": "00-2411bc12d95f04459e873f5472af3310-446bd8f8eb2b274b-01",
    "request-id": "|2411bc12d95f04459e873f5472af3310.446bd8f8eb2b274b.",
    "request-context": "appId=cid-v1:5612578a-5658-4795-ae18-4e8a5694fd54"
  }
}

If we use WebTests (which only employs request-id) the response looks:

{
  "debug": {
    "traceContext": {
      "traceparent": "00-6ac7d0ea15ea40ea8bd265bf79b6051f-4a640190ef2a7b4d-00",
      "tracestate": "",
      "attributes": {}
    },
    "spanContext": null
  },
  "cc": {
    "operation": {
      "name": "GET /health",
      "id": "60c796488fdc49e8a67be8e1b11557e7",
      "parentId": "60c796488fdc49e8a67be8e1b11557e7",
      "traceparent": {
        "traceFlag": "01",
        "version": "00",
        "traceId": "60c796488fdc49e8a67be8e1b11557e7",
        "spanId": "9e69d997178847c4"
      },
      "tracestate": {
        "fieldmap": []
      }
    },
    "customProperties": {
      "props": []
    }
  },
  "req": {
    "request-id": "|6ac7d0ea15ea40ea8bd265bf79b6051f.0.b2f0e96e_b2f0e96f_",
    "request-context": "appId=cid-v1:5612578a-5658-4795-ae18-4e8a5694fd54"
  },
  "parentId": "60c796488fdc49e8a67be8e1b11557e7"
}

Here it's apparent that the Header Parser fails to parse the request-id (https://github.com/microsoft/ApplicationInsights-node.js/blob/develop/AutoCollection/HttpRequestParser.ts#L242-L294)...

Digging further....

f0o commented 3 years ago

Fixed by patching:

diff --git a/AutoCollection/CorrelationContextManager.ts b/AutoCollection/CorrelationContextManager.ts
index ffa8f41..c7b2821 100644
--- a/AutoCollection/CorrelationContextManager.ts
+++ b/AutoCollection/CorrelationContextManager.ts
@@ -197,8 +197,24 @@ export class CorrelationContextManager {

         // AzFunction TraceContext available
         if (traceContext) {
-            const traceparent = new Traceparent(traceContext.traceparent);
-            const tracestate = new Tracestate(traceContext.tracestate);
+            let traceparent = null;
+            let tracestate = null;
+            if ( context.req && context.req.headers ) {
+                if( context.req.headers.traceparent ) {
+                    traceparent = new Traceparent(context.req.headers.traceparent);
+                } else if ( context.req.headers["request-id"] ) {
+                    traceparent = new Traceparent(null, context.req.headers["request-id"]);
+                }
+                if( context.req.headers.tracestate ) {
+                    tracestate = new Tracestate(context.req.headers.tracestate);
+                }
+            }
+            if( !traceparent ) {
+                traceparent = new Traceparent(traceContext.traceparent);
+            }
+            if( !tracestate ) {
+                tracestate = new Tracestate(traceContext.tracestate);
+            }
             const parser = typeof request === "object"
                 ? new HttpRequestParser(request)
                 : null;

Pull Request will follow.

hectorhdzg commented 3 years ago

Can you add more details about what exactly is App Insight Web-Tests?, is this some kinda of web app using ApplicationInsights-JS?, I'm a little bit confused where the tracing is broken, is it in Azure Functions at the end of your pipeline or in the beginning in the Web client?, I can see APIM add request-id and traceparent headers so this must be good, please ensure W3C correlation protocol is selected in there. image

f0o commented 3 years ago

@hectorhdzg

Can you add more details about what exactly is App Insight Web-Tests?

https://docs.microsoft.com/en-us/azure/azure-monitor/app/availability-standard-tests https://docs.microsoft.com/en-us/azure/templates/microsoft.insights/webtests

APIM was set to W3C but if you send a request to APIM using the header request-id it will not add traceparent to the backend requests and the whole chain fails.

I think the core issue is when no traceparent header is set, the AZ Runtime will create a TraceContext or at least the SDK will without respecting the existing request-id passed by APIM.

I've only been able to use the README's wrapper with context, req after I applied my patch - prior to that it would only link up for manual requests if used req, req or just req, null and it would continue to fail for requests with request-id as emitted by WebTests.

I've also tried switching the entire chain to Legacy but that yield the same faulty outcome.

You can repro it simply by using WebTest -> APIM -> AZFunc; You will see them as 2 clusters:

  1. APIM -> Backend
  2. AZFunc

Without any glue/link between the two clusters.

hectorhdzg commented 3 years ago

Thanks for your patience @f0o I'm following with internal teams involved, will keep you updated.

f0o commented 3 years ago

@hectorhdzg I think it could get fixed if we would inherit the traceparent object extracted from the http-parser instead of just tossed into the custom-props then the whole if-else thing would go away.

my JS/TS is very rusty so I just went with the most obvious solution after I found out that startOperation(req) yields the correct behavior.

I'll run a test with checking for parser.getCorrelationContextHeader() and inherit those as traceparent/state values.

hectorhdzg commented 3 years ago

@f0o I'm still waiting for Azure Functions people to get more specifics regarding this, I can see Azure Functions being able to correctly handle both Legacy and W3C headers and populate context.traceContext correctly, I'm using following code to test Azure Function and sending headers through Postman both "request-id" and "traceparent", apparently "Code + Test" blade in Azure portal is not sending the headers as expected, at least for me, can you give it a try? I wonder if there was some issue that was recently fixed in their side.

import { AzureFunction, Context, HttpRequest } from "@azure/functions"
const httpTrigger: AzureFunction = async function (context: Context, req: HttpRequest): Promise<void> {
    context.log('HTTP trigger function processed a request.');
    context.res = {
        body: JSON.stringify(context)
    };
};
export default httpTrigger;
f0o commented 3 years ago

@hectorhdzg I can get a test running on the weekend hopefully; tied up in travels right now.

But from what I recall from my notes is that CorrelationContextManager.ts does not honor the HTTP header request-id (specifically when only request-id is sent) that's been passed/enriched by APIM.

And in the code it also looks accurately because the HTTP request is being parsed but it's traceparent information is not being used anywhere but as annotational custom properties (https://github.com/microsoft/ApplicationInsights-node.js/blob/develop/AutoCollection/CorrelationContextManager.ts#L227-L228 -> https://github.com/microsoft/ApplicationInsights-node.js/blob/develop/AutoCollection/CorrelationContextManager.ts#L89)

This leads to the correlation being able to success on drilldown/search but the clusters not being linking up as a direct graph.

Instead they're two independent graphs as outlined earlier:

  1. APIM -> Backend (literally the Backend Name inside APIM config)
  2. AZFunc -> any sub-dependencies like MongoDB, axios, whatever

With my patch it reuses the request-id header and thus the graph becomes linked together:

APIM -> AZFunc (Actually Backend -> AZFunc but the UI concats them, the breakdown shows them tho) -> any sub-dependencies like MongoDB, axios, whatever