open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.64k stars 762 forks source link

No longer collecting metrics after package upgrade #4324

Closed djMax closed 4 months ago

djMax commented 9 months ago

What happened?

Steps to Reproduce

In my open source project, I am on api@1.6 and http instrumentation @0.43. When I upgrade to the latest, most things seem to be ok, but NOT the http instrumentation - the server histograms are no longer present. The main branch tests this with current packages and works, and the djmax/latestotlp branch upgrades the packages and fails.

https://github.com/openapi-typescript-infra/service/pull/12

Expected Result

HTTP metrics should be present

Actual Result

http_server_duration histogram is not present in Prometheus output.

Additional Details

This package has to do a lot of fancy footwork to get metrics to work because OTLP just isn't very stable (thus 0.x, no judgement here), so it's possible that some of those tricks are no longer necessary or are problematic, but getting any definitive information on that is nearly impossible. So feel free to point me at something I'm doing completely incorrectly or is no longer necessary, etc. But I did verify that the http instrumentation IS getting called, and even _getHttpInstrumentation is being called, but it doesn't ever call the function arguments for this code:

return new instrumentation_1.InstrumentationNodeModuleDefinition('http', ['*'], moduleExports => {

OpenTelemetry Setup Code

https://github.com/openapi-typescript-infra/service/blob/main/src/telemetry/index.ts#L49

package.json

{
  "name": "@openapi-typescript-infra/service",
  "version": "4.9.0",
  "description": "An opinionated framework for building configuration driven services - web, api, or  ob. Uses OpenAPI, pino logging, express, confit, Typescript and vitest.",
  "main": "build/index.js",
  "scripts": {
    "test": "vitest",
    "lint": "eslint .",
    "build": "tsc -p tsconfig.build.json && yarn dlx glob-chmod 755 build/bin/*",
    "watch": "tsc -p tsconfig.json -w --preserveWatchOutput",
    "clean": "npx rimraf ./build",
    "prepublishOnly": "yarn build",
    "postinstall": "coconfig"
  },
  "repository": {
    "type": "git",
    "url": "git+https://github.com/openapi-typescript-infra/service.git"
  },
  "bin": {
    "start-service": "./build/bin/start-service.js"
  },
  "config": {
    "coconfig": "@openapi-typescript-infra/coconfig"
  },
  "engines": {
    "node": ">=18"
  },
  "keywords": [
    "service",
    "openapi",
    "express",
    "confit",
    "babel",
    "typescript",
    "vitest"
  ],
  "author": "developers@pyralis.com>",
  "license": "MIT",
  "bugs": {
    "url": "https://github.com/openapi-typescript-infra/service/issues"
  },
  "release": {
    "branches": [
      "main"
    ],
    "plugins": [
      "@semantic-release/commit-analyzer",
      "@semantic-release/release-notes-generator",
      "@semantic-release/changelog",
      [
        "@semantic-release/exec",
        {
          "publishCmd": "yarn dlx pinst --disable"
        }
      ],
      "@semantic-release/npm",
      "@semantic-release/git"
    ]
  },
  "homepage": "https://github.com/openapi-typescript-infra/service#readme",
  "dependencies": {
    "@godaddy/terminus": "^4.12.1",
    "@opentelemetry/api": "^1.7.0",
    "@opentelemetry/exporter-prometheus": "^0.45.1",
    "@opentelemetry/exporter-trace-otlp-proto": "^0.45.1",
    "@opentelemetry/instrumentation": "^0.45.1",
    "@opentelemetry/instrumentation-dns": "^0.32.4",
    "@opentelemetry/instrumentation-express": "^0.33.3",
    "@opentelemetry/instrumentation-generic-pool": "^0.32.4",
    "@opentelemetry/instrumentation-graphql": "^0.36.0",
    "@opentelemetry/instrumentation-http": "^0.45.1",
    "@opentelemetry/instrumentation-ioredis": "^0.36.0",
    "@opentelemetry/instrumentation-net": "^0.32.3",
    "@opentelemetry/instrumentation-pg": "^0.37.1",
    "@opentelemetry/instrumentation-pino": "^0.34.3",
    "@opentelemetry/resource-detector-container": "^0.3.4",
    "@opentelemetry/resource-detector-gcp": "^0.29.4",
    "@opentelemetry/resources": "^1.18.1",
    "@opentelemetry/sdk-metrics": "^1.18.1",
    "@opentelemetry/sdk-node": "^0.45.1",
    "@opentelemetry/sdk-trace-base": "^1.18.1",
    "@opentelemetry/semantic-conventions": "^1.18.1",
    "@sesamecare-oss/confit": "^2.2.0",
    "@sesamecare-oss/opentelemetry-node-metrics": "^1.0.1",
    "ajv": "^8.12.0",
    "cookie-parser": "^1.4.6",
    "dotenv": "^16.3.1",
    "express": "next",
    "express-openapi-validator": "^5.1.1",
    "glob": "^8.1.0",
    "lodash": "^4.17.21",
    "minimist": "^1.2.8",
    "opentelemetry-instrumentation-fetch-node": "^1.1.2",
    "pino": "^8.16.2",
    "read-pkg-up": "^7.0.1",
    "request-ip": "^3.3.0"
  },
  "devDependencies": {
    "@commitlint/cli": "^18.4.3",
    "@commitlint/config-conventional": "^18.4.3",
    "@openapi-typescript-infra/coconfig": "^4.2.2",
    "@semantic-release/changelog": "^6.0.3",
    "@semantic-release/commit-analyzer": "^11.1.0",
    "@semantic-release/exec": "^6.0.3",
    "@semantic-release/git": "^10.0.1",
    "@semantic-release/release-notes-generator": "^12.1.0",
    "@types/cookie-parser": "^1.4.6",
    "@types/express": "^4.17.21",
    "@types/glob": "^8.1.0",
    "@types/lodash": "^4.14.202",
    "@types/minimist": "^1.2.5",
    "@types/node": "^20.10.0",
    "@types/request-ip": "^0.0.41",
    "@types/supertest": "^2.0.16",
    "@typescript-eslint/eslint-plugin": "^6.12.0",
    "@typescript-eslint/parser": "^6.12.0",
    "coconfig": "^1.0.0",
    "eslint": "^8.54.0",
    "eslint-config-prettier": "^9.0.0",
    "eslint-plugin-import": "^2.29.0",
    "pino-pretty": "^10.2.3",
    "pinst": "^3.0.0",
    "supertest": "^6.3.3",
    "ts-node": "^10.9.1",
    "tsconfig-paths": "^4.2.0",
    "typescript": "^5.3.2",
    "vitest": "^0.34.6"
  },
  "resolutions": {
    "qs": "^6.11.0"
  },
  "packageManager": "yarn@3.2.3"
}

Relevant log output

No response

djMax commented 9 months ago

I do notice these odd messages:

@opentelemetry/instrumentation-express Module express has been loaded before @opentelemetry/instrumentation-express so it might not work, please initialize it before requiring express
@opentelemetry/instrumentation-pino Module pino has been loaded before @opentelemetry/instrumentation-pino so it might not work, please initialize it before requiring pino

And what is odd about them is if I console.log in instrumentation-express, and in express, the former is loaded first.

djMax commented 9 months ago

Additional detail - even in the working version this is what shows up in Prometheus metrics:

# HELP http_server_duration Measures the duration of inbound HTTP requests.
# UNIT http_server_duration ms
# TYPE http_server_duration histogram
# HELP http_client_duration Measures the duration of outbound HTTP requests.
# UNIT http_client_duration ms
# TYPE http_client_duration histogram

I would expect there to be values given that I am making http requests to the test server.

dyladan commented 9 months ago

And what is odd about them is if I console.log in instrumentation-express, and in express, the former is loaded first.

It's not about when you create objects from express but when the actual require statement is called. The instrumentation MUST be set up before the require('express') statement is evaluated

djMax commented 9 months ago

Right - and I console.log in the first line of each module and they are required in the correct order

dyladan commented 9 months ago

I'm sorry I might not have been clear enough. The require('express') must be called after the instrumentation is actually enabled, not just after it is required. It sounds like you have a typical setup where all the require statements are at the top of the file. The require for express must be moved to below the actual SDK setup, not just its require

dyladan commented 9 months ago

It is a bit odd to me that just a dependency upgrade would surface this issue. We haven't made any changes recently that I thinks should affect the load order. I looked through your project a bit and it seems like your telemetry/index.ts file imports your express app, which imports express. These imports are called long before the SDK is started in your startGlobalTelemetry function.

djMax commented 8 months ago

I do have a branch that dynamically imports express to try and delay it until as late as possible. The behavior is the same.

shinebayar-g commented 8 months ago

Running to the same issue. Still getting this messages even sdk.start() is called before import express.

djMax commented 6 months ago

Well, a month and change has elapsed and some new versions are out, but unfortunately the behavior is the same. I've updated https://github.com/openapi-typescript-infra/service/pull/12 and the test still fails in the same ways. I'm happy to poke at stuff, but it's such an inscrutable mix of complex interdependencies that it seems pointless to go it alone.

pichlermarc commented 6 months ago

Hi @djMax, I've tried to minimally reproduce this, but was unsuccessful: https://github.com/pichlermarc/repro-4324

As @dyladan said earlier, it's odd that a depdendency upgrade would surface this. In previous versions we have also exported metrics that were never written (this was not spec compliant, so we removed it), could this be what you're seeing on the older version (metrics with empty buckets, sum and count)? :thinking:

djMax commented 4 months ago

I've given this another go and succeeded this time. For anyone wandering by, a few things:

1) Resource detectors are sync now, but not all support sync. So I had to do this to resolve the resources manually. 2) In vitest, require timing gets tricky, so I needed to move to a per-test setup file that gets loaded BEFORE the test file that ends up requiring express. Global setup won't work because it runs in a separate process.. 3) Remove unnecessary dependencies as sdk-node already includes a bunch of them.

Here's the overall PR.