microsoft / ApplicationInsights-node.js

Microsoft Application Insights SDK for Node.js
MIT License
325 stars 139 forks source link

[Beta] - Nested objects get serialised as "[object Object]" #1169

Closed mderriey closed 1 year ago

mderriey commented 1 year ago

Hi there 👋

I hope you're well.

Issue

We've upgraded from 3.0.0-beta.5 to 3.0.0-beta.6 recently and we noticed that nested objects get serialised as "[object Object]".

See the screenshot below. For reference, we deployed the package upgrade sometime during the 3rd of July. No traces before the 3rd of July show customDimensions.variables with a value of [object Object]

image

Setup

We don't interact with the SDK or the trace handler directly, all our logging is done through winston, and we have the appropriate log instrumentation enabled.

Here's the whole config, in case it helps:

import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express'
import { GraphQLInstrumentation } from '@opentelemetry/instrumentation-graphql'
import { WinstonInstrumentation } from '@opentelemetry/instrumentation-winston'
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http'
import { Resource } from '@opentelemetry/resources'
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions'
import { ApplicationInsightsClient, ApplicationInsightsConfig } from 'applicationinsights'
import { TediousInstrumentation } from '@opentelemetry/instrumentation-tedious'
import { DataloaderInstrumentation } from '@opentelemetry/instrumentation-dataloader'

// Default attributes set on the OpenTelemetry traces
const resource = Resource.EMPTY
resource.attributes[SemanticResourceAttributes.SERVICE_NAME] = '<redacted>'
resource.attributes[SemanticResourceAttributes.SERVICE_NAMESPACE] = '<redacted>'

const config = new ApplicationInsightsConfig()
config.resource = resource

/**
 * Disabling this for now as it shows 1000s of failed requests on the App Insights dashboard.
 * See https://github.com/microsoft/ApplicationInsights-node.js/issues/1123
 */
config.enableAutoCollectStandardMetrics = false

/**
 * - Disable the built-in HTTP instrumentation and use the official one (see below)
 * - Enable the Azure SDK instrumentation
 * - Enable winston instrumentation (which sends logs to App Insights)
 */
config.instrumentations.http = { enabled: false }
config.instrumentations.azureSdk = { enabled: true }
config.logInstrumentations.winston = { enabled: true }

const client = new ApplicationInsightsClient(config)

/**
 * Add various instrumentations which will enrich the end-to-end traces in App Insights.
 * We can add more as we integrate different systems.
 *
 * Note: the winston instrumentation defined below adds trace and span IDs to emitted log entries,
 * and will not conflict with the App Insights-specific instrumentation above
 */
client.getTraceHandler().addInstrumentation(new HttpInstrumentation())
client.getTraceHandler().addInstrumentation(new ExpressInstrumentation())
client.getTraceHandler().addInstrumentation(new WinstonInstrumentation())
client.getTraceHandler().addInstrumentation(new GraphQLInstrumentation())
client.getTraceHandler().addInstrumentation(new TediousInstrumentation())
client.getTraceHandler().addInstrumentation(new DataloaderInstrumentation())

The logging code is standard winston and looks like this:

import { isLocalDev, Logger } from '@makerx/node-common'
import { ExecutionArgs, GraphQLFormattedError} from 'graphql'
import omitBy from 'lodash.omitby'
import { isIntrospectionQuery, isNil } from './utils'
import { GraphQLContext } from './context'

interface GraphQLLogOperationInfo {
  started: number
  operationName?: string | null
  query?: string | null
  variables?: Record<string, unknown> | null
  result: {
    data?: Record<string, unknown> | null
    errors?: readonly GraphQLFormattedError[] | null
  }
  logger: Logger
}

// 👇 That's the function used to log 
export const logGraphQLOperation = ({ started, operationName, query, variables, result: { errors }, logger }: GraphQLLogOperationInfo) => {
  const isIntrospection = query && isIntrospectionQuery(query)
  if (isLocalDev && isIntrospection) return
  logger.info(
    'GraphQL operation',
    omitBy(
      {
        operationName,
        query: isIntrospection ? 'IntrospectionQuery' : query,
        variables: variables && Object.keys(variables).length > 0 ? variables : undefined,
        duration: Date.now() - started,
        errors,
      },
      isNil,
    ),
  )
}

Notes

Possibly related to #1168, although in that case it looks like they interact with the SDK directly and had to make some modifications to their code.

mderriey commented 1 year ago

I spent some time debugging this today, and believe I might have found something.

First, when debugging, I could consistently see those messages in the debug windows:

ApplicationInsights:Invalid attribute value set for key: parameters 
ApplicationInsights:Invalid attribute value set for key: request

Then, going all the way down to the @azure/core-client lib, I could see that the request body that is sent to the service endpoint does have the correct JSON representation of the properties that come up as "[object Object]" in the portal.

Here's a sample of a request body, somewhat redacted:

[
  {
    "ver": 1,
    "name": "Microsoft.ApplicationInsights.Message",
    "time": "2023-07-07T14:58:07.162Z",
    "sampleRate": 100,
    "iKey": "<redacted>",
    "tags": {
      "ai.device.osVersion": "Windows_NT 10.0.22621",
      "ai.internal.sdkVersion": "node18:otel1.14.0:dst3.0.0-beta.6",
      "ai.cloud.role": "<redacted>",
      "ai.cloud.roleInstance": "DESKTOP-DIT65QM",
      "ai.operation.id": "c7ff8b12cf39775c3af9586494c544c2",
      "ai.operation.parentId": "e87b299aa6b3e33b"
    },
    "data": {
      "baseType": "MessageData",
      "baseData": {
        "ver": 2,
        "message": "GraphQL operation",
        "severityLevel": "Information",
        "properties": {
          "request": { "requestId": "998d84d6-65fa-4f91-8e2b-03c3980a03e1" },
          "tenantAlias": "Local",
          "service": "<redacted>",
          "operationName": "Sessions",
          "query": "query Sessions($offset: NonNegativeInt!, $limit: PositiveInt!) {\r\n  findEvents(offset: $offset, limit: $limit) {\r\n    id\r\n    agenda {\r\n      items {\r\n        ... on Session {\r\n          id\r\n          name\r\n        }\r\n      }\r\n    }\r\n  }\r\n}",
          "variables": { "offset": 100, "limit": 3 },
          "duration": 253,
          "trace_id": "c7ff8b12cf39775c3af9586494c544c2",
          "span_id": "e87b299aa6b3e33b",
          "trace_flags": "01"
        },
        "measurements": {}
      }
    }
  }
]

and the HTTP request options:

{
  "hostname": "australiaeast-0.in.applicationinsights.azure.com",
  "path": "/v2.1/track",
  "port": "",
  "method": "POST",
  "headers": {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Accept-Encoding": "gzip,deflate",
    "User-Agent": "azsdk-js-monitor-opentelemetry-exporter/1.0.0-beta.14 core-rest-pipeline/1.11.0 Node/v18.16.1 OS/(x64-Windows_NT-10.0.22621)",
    "x-ms-client-request-id": "14a5cd65-2e07-4db7-ac99-68453532b250",
    "Content-Length": "2952",
    "traceparent": "00-c7ff8b12cf39775c3af9586494c544c2-ad9ef54e2e76a6bb-01"
  }
}

And the associated trace in the portal (notice the operation ID highlighted in green that shows it's the same telemetry item)

image

It's as if the service was choking on the nested objects in the data.baseData.properties object.

Let me know if that helps. Cheers, have a good WE.

mderriey commented 1 year ago

Hi @hectorhdzg, could you please have a look at this when you have some time and let me know if I can provide you with more information to help diagnose this?

Cheers 🙏

JacksonWeber commented 1 year ago

@mderriey I just wanted to let you know I was able to reproduce your issue, even with a very basic Winston logging event. I'll test the other console instrumentations as well and look into what might be causing this. Thank you for raising the issue!

cuzzlor commented 1 year ago

@JacksonWeber do you have any status update on this issue?

JacksonWeber commented 1 year ago

@cuzzlor I'be had my head down in some high priority work for the SDK, but I can dedicate some time back to this issue tonight. Thanks for bringing it up!

JacksonWeber commented 1 year ago

@cuzzlor @mderriey This issue should be fixed by the above merged PR. Once the next release of the exporter goes out we can update the SDK as well.

mderriey commented 1 year ago

Thanks heaps @JacksonWeber.

I suspect this was on your list, but will you be able to leave us a note here when a new beta that uses @azure/monitor-opentelemetry-exporter@1.0.0-beta.15 gets released so we can give it a go?

Cheers.

JacksonWeber commented 1 year ago

@mderriey Absolutely, I'll make sure to leave you a note here as soon as it's available!

JacksonWeber commented 1 year ago

@mderriey @cuzzlor The issue should be resolved as of 3.0.0 beta.8 https://github.com/microsoft/ApplicationInsights-node.js/releases/tag/3.0.0-beta.8

mderriey commented 1 year ago

Thanks @JacksonWeber

Unfortunately, it doesn't seem to be the case. This is the same app running on Node v18.17.0 using 3.0.0-beta.8.

Setup code now looks as follows:

import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express'
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http'
import { Resource } from '@opentelemetry/resources'
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions'
import { TediousInstrumentation } from '@opentelemetry/instrumentation-tedious'
import { DataloaderInstrumentation } from '@opentelemetry/instrumentation-dataloader'
import { RedisInstrumentation } from '@opentelemetry/instrumentation-redis-4'
import { SetTenantAliasAttributeSpanProcessor, tenantAliasAttributeName } from './tracing/set-tenant-alias-attribute-span-processor'
import { IncomingMessage } from 'node:http'
import { getTenantAlias } from './util/get-tenant-alias'
import { TruncateDbStatementAttributeSpanProcessor } from './tracing/truncate-db-statement-attribute-span-processor'
import { useAzureMonitor } from 'applicationinsights'
import type { ProxyTracerProvider } from '@opentelemetry/api'
import { trace } from '@opentelemetry/api'
import type { NodeTracerProvider } from '@opentelemetry/sdk-trace-node'
import { registerInstrumentations } from '@opentelemetry/instrumentation'

// Default attributes set on the OpenTelemetry traces
const resource = Resource.EMPTY
resource.attributes[SemanticResourceAttributes.SERVICE_NAME] = '<redacted>'
resource.attributes[SemanticResourceAttributes.SERVICE_NAMESPACE] = '<redacted>'

useAzureMonitor({
  resource,
  instrumentationOptions: {
    http: { enabled: false },
  },
  logInstrumentationOptions: {
    winston: { enabled: true },
  },
})

const tracerProvider = (trace.getTracerProvider() as ProxyTracerProvider).getDelegate() as NodeTracerProvider

tracerProvider.addSpanProcessor(new SetTenantAliasAttributeSpanProcessor())
tracerProvider.addSpanProcessor(new TruncateDbStatementAttributeSpanProcessor())

registerInstrumentations({
  tracerProvider,
  instrumentations: [
    new HttpInstrumentation({
      // We need to manually set the attribute for requests spans because the OTel context
      // with the tenant alias is not available when this hook is invoked.
      applyCustomAttributesOnSpan(span, request) {
        if (request instanceof IncomingMessage) {
          const tenantAlias = getTenantAlias(request)
          if (tenantAlias) {
            span.setAttribute(tenantAliasAttributeName, tenantAlias)
          }
        }
      },
    }),
    new ExpressInstrumentation(),
    new TediousInstrumentation(),
    new DataloaderInstrumentation(),
    new RedisInstrumentation(),
  ],
})

image

JacksonWeber commented 1 year ago

@mderriey Taking a look this morning

JacksonWeber commented 1 year ago

@mderriey Dug into this today and it looks like the issue is with how OpenTelemetry defines the LogAttributes interface https://open-telemetry.github.io/opentelemetry-js/modules/_opentelemetry_api.html#AttributeValue. While the message field can now accept and parse objects, those attributes (request and variables) are OpenTelemetry LogRecords that store their attributes as LogAttributes where each LogAttribute maps a key to a LogAttributeValue which cannot be an object.

mderriey commented 1 year ago

Hmm, OK, thanks for looking into this.

I managed to get it working by updating the _telemetryToLogRecord function in src/logs/api.ts, see the diff below:

    private _telemetryToLogRecord(
        telemetry: Contracts.Telemetry,
        baseType: string,
        baseData: MonitorDomain
    ): LogRecord {
        try {
-           const attributes: Attributes = {
-               ...telemetry.properties,
-           };
+           const attributes: Attributes = {};
+           if (telemetry.properties) {
+               for (const [key, value] of Object.entries(telemetry.properties)) {
+                   attributes[key] = typeof value === 'object'
+                       ? JSON.stringify(value)
+                       : value
+               }
+           }
            const record: LogRecord = { attributes: attributes, body: Util.getInstance().stringify(baseData) };
            record.attributes["_MS.baseType"] = baseType;
            return record;
        }
        catch (err) {
            Logger.getInstance().warn("Failed to convert telemetry event to Log Record.", err);
        }
    }

It then shows up properly in the portal

image

I did it there because it's one the place where we construct an OpenTelemetry LogRecord, and I suspect it'd look strange if we did it somewhere else that consumes a LogRecord as we'd need to assume the attributes do not comply with the type definitions.

Is this something that we could imagine to be taken in the repo, or does this change have side effects that could impact other areas in the library?

Happy to open a PR if that helps you.

mderriey commented 1 year ago

Apologies, @JacksonWeber, I didn't tag you on the comment above, could you please have a look and let us know how that sounds to you?

JacksonWeber commented 1 year ago

@mderriey I'll look through to make sure this won't have any adverse effects elsewhere in the SDK, but I'll take a look this morning, thanks again!

JacksonWeber commented 1 year ago

@mderriey Did some testing and it looks good. Feel free to create a PR if you'd like!