vercel / next.js

The React Framework
https://nextjs.org
MIT License
127.52k stars 27.04k forks source link

Instrumentation files not included in standalone output #68740

Open moshie opened 3 months ago

moshie commented 3 months ago

Link to the code that reproduces this issue

https://github.com/moshie/nextjs-issue

To Reproduce

  1. Pull down the repo: https://github.com/moshie/nextjs-issue
  2. run npm install
  3. run npm run build
  4. See in the .next/standalone/node_modules folder that the @splunk/otel prebuilds folder is not included

Current vs. Expected behavior

When using the @splunk/otel package in the instrumentation file the standalone output doesn't include the prebuilds from: node_modules/@splunk/otel/prebuilds/linux-x64/@splunk+otel.abi115.node

That results in this error:

Screenshot 2024-08-08 at 17 30 08

Since outputFileTracingIncludes seems like it's tied to the pages and app directories I am not sure how best to include this file into my build?

I've tried in my docker file to include something similar to this:

COPY --chown=nonroot --from=builder /opt/application/node_modules/@splunk ./node_modules/@splunk
COPY --chown=nonroot --from=builder /opt/application/node_modules/.pnpm/@splunk+otel@2.10.0_@opentelemetry+api@1.9.0 ./node_modules/.pnpm/@splunk+otel@2.10.0_@opentelemetry+api@1.9.0

But this isn't a very elegant solution since the path names contain versions and this would need to be updated everytime we update the @splunk/otel package.

I suppose this might have solved my issue but it's been removed (I don't know if it copied whole folders?):

unstable_includeFiles: ['node_modules/@splunk/otel'],

Provide environment information

Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000
  Available memory (MB): 32768
  Available CPU cores: 10
Binaries:
  Node: 20.16.0
  npm: 10.8.1
  Yarn: 1.22.22
  pnpm: 9.7.0
Relevant Packages:
  next: 14.2.5
  eslint-config-next: N/A
  react: 18.3.1
  react-dom: 18.3.1
  typescript: 5.5.4
Next.js Config:
  output: standalone

Which area(s) are affected? (Select all that apply)

Instrumentation, Output (export/standalone)

Which stage(s) are affected? (Select all that apply)

next build (local), next start (local)

Additional context

My app is dockerised and is using distoless. I thought about just running pnpm install @splunk/otel directly in the standalone folder but that didn't seem like it would work with distroless and also it's not very elegant either.

admmasters commented 3 months ago

There do seem to be some painpoints with nextjs and certain packages when utilising standalone - wondering if there is scope for more work on the mechanism which determines what to include in the bundle and how that works with binaries etc

guinanlin commented 3 months ago

via https://github.com/vercel/next.js/issues/49897 we can get the Instrumentation not working well with runtime=nodejs

after my testing on my 14.2.5 next.js version.

it will work well while pnpm run dev but not occur while runtime=nodejs, on pnpm start

moshie commented 2 months ago

Any update on this?

Edit by maintainer bot: Comment was automatically minimized because it was considered unhelpful. (If you think this was by mistake, let us know). Please only comment if it adds context to the issue. If you want to express that you have the same problem, use the upvote 👍 on the issue description or subscribe to the issue for updates. Thanks!

zacharyblasczyk commented 3 weeks ago

Addressing the OP

Sorry if this is verbose, but I am having a very similar issue for weeks and also can't seem to figure this out.

My issue seems to be related, but in general it seems like external otel integrations aren't well supported in containerized builds using the standalone output.

Specifically on the point of including the dependencies in the final standalone build, I think you need to add a serverComponentsExternalPackages block to your experimental section of the next.config.js like this:

  experimental: {
    instrumentationHook: true,
    optimizePackageImports: ["bullmq", "googleapis"],
    /** @see https://github.com/open-telemetry/opentelemetry-js/issues/4297 */
    serverComponentsExternalPackages: [
      "@opentelemetry/sdk-node",
      "@opentelemetry/auto-instrumentations-node",
      "@opentelemetry/exporter-trace-otlp-http",
      "@appsignal/opentelemetry-instrumentation-bullmq",
      "@opentelemetry/instrumentation-pg",
      "@opentelemetry/resources",
      "@opentelemetry/semantic-conventions",
    ],
  },

In next 15 I think this has changed to serverExternalPackages re this.

You also need to set the ENV NODE_PATH in your dockerfile so that NODE_OPTIONS can work and node knows where to look for the node_modules folder.

i.e.

ENV NODE_PATH=/app/apps/webservice/node_modules
NODE_OPTIONS=--require=@opentelemetry/auto-instrumentations-node
zacharyblasczyk commented 3 weeks ago

It is also worth calling out that according to the docs:

The instrumentation file should be in the root of your project and not inside the app or pages directory. If you're using the src folder, then place the file inside src alongside pages and app.

zacharyblasczyk commented 3 weeks ago

Again, sorry for the long post. Please read. 🙏

On the point of a difference between local and containerized builds.

In my case—if anyone is willing to help me out—everything works fine locally, but as soon as I containerize the app (which is happening with standalone output using the above "solution") I lose a bunch of my traces.

The standalone folder and node server.js doesn't seem to have a way to initialize the register() function properly with all of its dependencies.

Specifically in my case, I lose the pg and pg.pool traces from: @opentelemetry/instrumentation-pg

Locally running pnpm dev:

@ctrlplane/webservice:dev: @vercel/otel: Configure propagator: tracecontext
@ctrlplane/webservice:dev: @vercel/otel: Configure propagator: baggage
@ctrlplane/webservice:dev: @vercel/otel: Configure sampler:  parentbased_always_on
@ctrlplane/webservice:dev: @vercel/otel: Configure trace exporter:  http/protobuf http://localhost:4318/v1/traces headers: <none>
@ctrlplane/webservice:dev: @vercel/otel/otlp: onInit
@ctrlplane/webservice:dev: @opentelemetry/api: Registered a global for trace v1.9.0.
@ctrlplane/webservice:dev: @opentelemetry/api: Registered a global for context v1.9.0.
@ctrlplane/webservice:dev: @opentelemetry/api: Registered a global for propagation v1.9.0.
@ctrlplane/webservice:dev: @vercel/otel: Configure instrumentations: fetch undefined
@ctrlplane/webservice:dev: @vercel/otel: started ctrlplane/webservice nodejs
@ctrlplane/webservice:dev: Fo found resource. r {
@ctrlplane/webservice:dev:   _attributes: {},
@ctrlplane/webservice:dev:   asyncAttributesPending: false,
@ctrlplane/webservice:dev:   _syncAttributes: {},
@ctrlplane/webservice:dev:   _asyncAttributesPromise: undefined
@ctrlplane/webservice:dev: }
...
# I see this further in the logs: 
@ctrlplane/webservice:dev:     instrumentationLibrary: {
@ctrlplane/webservice:dev:       name: '@opentelemetry/instrumentation-pg',
@ctrlplane/webservice:dev:       version: '0.47.1',
@ctrlplane/webservice:dev:       schemaUrl: undefined
@ctrlplane/webservice:dev:     },

As soon as I containerize and try and running it, some http traces still work, but I lose all postgres tracing.

Here is part my dockerfile:

...
RUN turbo build --filter=...@ctrlplane/webservice

FROM base AS runner
WORKDIR /app

COPY --from=installer --chown=nodejs:nodejs /app/apps/webservice/.next/standalone ./
COPY --from=installer --chown=nodejs:nodejs /app/apps/webservice/.next/static ./apps/webservice/.next/static
COPY --from=installer --chown=nodejs:nodejs /app/apps/webservice/public ./apps/webservice/public

EXPOSE 3000

ENV PORT=3000
ENV AUTH_TRUST_HOST=true
ENV NODE_ENV=production
ENV NODE_PATH=/app/apps/webservice/node_modules
ENV HOSTNAME=0.0.0.0

CMD ["node", "apps/webservice/server.js"]

I have tried two branches of tactics in a variety of ways that I want to share if it is helpful to debugging this.

1 Using instrumentation.ts only with @vercel/otel.

import type { PgInstrumentationConfig } from "@opentelemetry/instrumentation-pg";
import { registerInstrumentations } from "@opentelemetry/instrumentation";
import { PgInstrumentation } from "@opentelemetry/instrumentation-pg";
import { WinstonInstrumentation } from "@opentelemetry/instrumentation-winston";
import { registerOTel } from "@vercel/otel";

export function register() {
  if (process.env.NEXT_RUNTIME === "nodejs") {
    const pgTraceConfig: PgInstrumentationConfig = {
      addSqlCommenterCommentToQueries: true,
      enhancedDatabaseReporting: true,
    };
    registerInstrumentations({
      instrumentations: [
        new PgInstrumentation(pgTraceConfig),
        new WinstonInstrumentation({
          logHook: (_, record) => {
            record["resource.service.name"] = "ctrlplane/webservice";
          },
        }),
      ],
    });
    registerOTel({
      serviceName: "ctrlplane/webservice",
      traceExporter: "auto",
    });
  }
}

2 Using a simple instrumentation.ts and instrumentation-node.ts with @opentelemetry/auto-instrumentations-node:

instrumentation.ts

export async function register() {
  if (process.env.NEXT_RUNTIME === "nodejs")
    await import("./instrumentation-node");
}

instrumentation-node.ts

import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { OTLPLogExporter } from "@opentelemetry/exporter-logs-otlp-http";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { Resource } from "@opentelemetry/resources";
import { BatchLogRecordProcessor } from "@opentelemetry/sdk-logs";
import { NodeSDK } from "@opentelemetry/sdk-node";
import {
  AlwaysOnSampler,
  SimpleSpanProcessor,
} from "@opentelemetry/sdk-trace-base";
import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";

const sdk = new NodeSDK({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: "ctrlplane/webservice",
  }),
  spanProcessors: [new SimpleSpanProcessor(new OTLPTraceExporter())],
  logRecordProcessors: [new BatchLogRecordProcessor(new OTLPLogExporter())],
  instrumentations: [
    getNodeAutoInstrumentations({
      "@opentelemetry/instrumentation-fs": {
        enabled: false,
      },
      "@opentelemetry/instrumentation-net": {
        enabled: false,
      },
      "@opentelemetry/instrumentation-dns": {
        enabled: false,
      },
      "@opentelemetry/instrumentation-http": {
        enabled: true,
      },
      "@opentelemetry/instrumentation-pg": {
        enabled: true,
        enhancedDatabaseReporting: true,
        addSqlCommenterCommentToQueries: true,
      },
      "@opentelemetry/instrumentation-ioredis": {
        enabled: true,
      },
      "@opentelemetry/instrumentation-winston": {
        enabled: false,
      },
    }),
  ],
  sampler: new AlwaysOnSampler(),
});

try {
  sdk.start();
  console.log("Tracing initialized test");
} catch (error) {
  console.error("Error initializing tracing", error);
}

Conclusion

Both path 1 and 2 work fine locally, but when containerized, pg traces don't show and there is nothing clear in the the verbose debug logs to help determine why 😓

I have tried setting:

NEXT_OTEL_VERBOSE: 1
OTEL_LOG_LEVEL: debug
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL: "http/protobuf"

to no avail.

zacharyblasczyk commented 3 weeks ago

Very similar to these issues:

https://github.com/backstage/backstage/issues/22555 https://github.com/vercel/next.js/issues/49897#issuecomment-2022510320