siimon / prom-client

Prometheus client for node.js
Apache License 2.0
3.09k stars 372 forks source link

Simple counter fails on Next.js -- Avoid using Node.js-specific APIs #584

Open pablote opened 1 year ago

pablote commented 1 year ago

Hi, I know this library is supposed to be framework agnostic, but it seems to be generating a bad interaction with Next.js. I have a counter defined like this:

let httpRequestsTotal = new client.Counter({
      name: "http_requests_total",
      help: "Total number of HTTP requests",
      labelNames: ["method", "route", "statusCode"],
    })

This counter works well if used from within an API route for example. Thing is, I want it to catch every single request, so I use it where it makes most sense, in a middleware:

import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
import { metrics } from "@/modules/common/metrics";

export function middleware(request: NextRequest) {
  NextResponse.next();
  console.log(`${request.method} -- ${request.url}`);
  metrics.httpRequestsTotal.inc({ method: request.method, route: request.url });
}

Doing this throws the following error:

Server Error
Error: A Node.js API is used (process.uptime) which is not supported in the Edge Runtime.

This might not make a lot of sense if you don't know Next.js but some code, like the middleware, runs on a different JS runtime where it cannot be assumed that Node.js global APIs are available. My main question is: why would a counter.inc() call process.uptime? Seems unnecessary.

Ideally, no Node.js specific APIs should be used, only standard Web APIs, unless specifically required (and being able to opt-out).

thanks for the help

Guid21 commented 1 year ago

I too am wondering how to solve this problem

SimenB commented 1 year ago

Next.js does some weird bundling stuff and have an odd global environment. There might be something off due to that? Probably better to ask in their support channels.

I don't think we've had requests to work in a browser context before, but that should be fine. Happy to take PRs allowing it.

pablote commented 1 year ago

I run the prom-client code locally on my computer and I was able to see what the problem was. Many of the default metrics run code outside of any function, so even if you don't use them, just by importing this code gets run. Some of that code uses Node.js libraries or grobals.

I changed all the cases that caused problems so that the code would be inside the function. See these couple examples:

Screenshot 2023-08-29 at 9 54 23 AM Screenshot 2023-08-29 at 9 54 38 AM

After doing this the error went away. That said, this is still not working well, in strange ways. I need to keep testing around, but apparently this is not gonna be easy.

I don't think we've had requests to work in a browser context before, but that should be fine. Happy to take PRs allowing it.

It's not a browser environment. It's a server-side JS runtime, only that Node.js specific APIs are not allowed. I'm not sure how Vercel runs code, maybe it's a Lambda@Edge thing but not sure. I know that Cloudfare Workers have the same restrictions.

pablote commented 1 year ago

I've managed to get this working on Next.js, but it required quite a few changes on prom-client which I'm not sure if the project would be willing to consider merging @SimenB.

You can check the changes in this fork here: https://github.com/siimon/prom-client/compare/master...pablote:prom-client:feature/next-friendly-changes?expand=1. To summarize:

SimenB commented 1 year ago

Making requires lazy and migrating to URL global should both be fine to land at least. Let's start with that? Then fetch is a natural one to target after that. Node 16 is EOL shortly, so we might get away with using global fetch as well, but I wanna evaluate that separately

harry-gocity commented 11 months ago

For simple default metrics, creating an api route works well in Next.js:

collectDefaultMetrics({});

const prometheus = async (_req: NextApiRequest, res: NextApiResponse) => {
    res.setHeader('Content-type', register.contentType);
    res.send(await register.metrics());
};

export default prometheus;

As for using prom-client anywhere else I've only been able to do so in getServerSideProps, as the middleware expects Vercel's edge runtime compatability. Quite a lot of (imo understandably) unhappy discussion about it here. It's very annoying for us as we are running dockerised node, we have nothing but the node runtime, and we get these edge runtime warnings.

In the end my getServerSideProps all have a wrapper that looks something like this:

import { GetServerSideProps, GetServerSidePropsContext } from 'next';
import { Counter, Histogram, register } from 'prom-client';

// Even though it may look like nothing has been registered yet, if next
// recompiles due to HMR prom-client will throw errors when calling
// new Histogram()/new Gauge()/new Counter() as metrics with the same name
// already exist in the registry.
// https://github.com/siimon/prom-client/issues/196
register.removeSingleMetric('nextjs_gSSP_duration_seconds');
register.removeSingleMetric('nextjs_page_requests_total');

export const histograms = {
    gSSPDurationSeconds: new Histogram({
        name: 'nextjs_gSSP_duration_seconds',
        help: 'Duration of gSSP in seconds',
        labelNames: baseLabelNames,
        buckets: [0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10],
    }),
};

export const counters = {
    pageRequestsTotal: new Counter({
        name: 'nextjs_page_requests_total',
        help: 'Count of page requests',
        labelNames: baseLabelNames,
    }),
};

// Register metrics on the default registry
register.registerMetric(histograms.gSSPDurationSeconds);
register.registerMetric(counters.pageRequestsTotal);

export const observedGetServerSideProps =
    (gSSP: GetServerSideProps, route: string) => async (context: GetServerSidePropsContext) => {
        // All incoming requests will have some derivable labels
        const baseLabels = { route, foo: 'bar' }

        // Run gSSP
        const endGSSPDurationTimer = histograms.gSSPDurationSeconds.startTimer();
        const response = await gSSP(context);
        const duration = endGSSPDurationTimer(baseLabels);

        // Capture metrics
        histograms.gSSPDurationSeconds.observe(duration);
        counters.pageRequestsTotal.labels(baseLabels).inc();

        // Return gSSP response result
        return response;
    };

That is used like this

const unwrappedGetServerSideProps: GetServerSideProps = () => {
    // your gSSP here
};

export const getServerSideProps: GetServerSideProps = observedGetServerSideProps(
    unwrappedGetServerSideProps, '/[destination]'
);

I did find that the observedGetServerSideProps had to be defined in another file, if I used something similar in the actual pages directory Next would complain at build, the same as your issue @pablote. I'm not entirely sure why a separate module gets around the warning, but Next doesn't mind 🤷🏼‍♂️

I would be very interested in improvements to allow us to use the middleware in Next rather than this wrapper, as currently we cannot use prom-client for any statically optimised pages (no getServerSideProps/getStaticProps). And the wrapping isn't ideal, it's very all-or-nothing for adding metrics to a page.

pablote commented 11 months ago

Seems like we've run into a similar picture. Using prom-client works well from getServerSideProps and the get static one as well, also from API routes. The thing is you have to add that individually everywhere, and it still doesn't catch some calls. Middleware is the natural place to do that, but like I said above, it only works if I heavily modify prom-client to not use anything Node.js specific.

I've seen some suggestions like using a custom server to host the Next.js app, but that's not an option for me.

That said, it looks like Next.js is integrating OpenTelemetry as a functionality, although it's still an experimental feature. So I'll probably not pursue this any further and see how that pans outs.

harry-gocity commented 11 months ago

it looks like Next.js is integrating OpenTelemetry as a functionality

Unfortunately this is where we're at too - I've only added prom-client in a couple of our next.js sites, but we have 10+ across various teams, and I need to recommend something for usage everywhere. That said I would like to avoid OpenTelemetry if I can, as we're already all-in on prom w/ push gateway for our java services.

pablote commented 11 months ago

It doesn't have to be one or the other I think. You can have an app using OT pushing metrics to a OT collector (this is very similar to pushing prom metrics to a push gateway). And then have Prometheus consume those from the collector, check this out: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/examples/demo

That said, last time I tried this in Next.js, it was only pushing trace/spans style metrics, and not counter/gauge/histogram prom style metrics. Not sure if it's a WIP and they plan to add that in the future.

IainMcHugh commented 2 months ago

For simple default metrics, creating an api route works well in Next.js:

collectDefaultMetrics({});

const prometheus = async (_req: NextApiRequest, res: NextApiResponse) => {
    res.setHeader('Content-type', register.contentType);
    res.send(await register.metrics());
};

export default prometheus;

@harry-gocity @pablote I've seen a few examples online with this approach, can you confirm that calling collectDefaultMetrics({}) outside of the API handler/inside the API handler file works as intended?

I am trying to migrate a Next app away from using a custom server and currently collectDefaultMetrics is being called when the custom server is initially invoked. My intuition is that this function should be called globally when the application is started, and so I'm wondering if calling it within the api/metrics.ts will work as expected. I have tried to move this invokation inside the instrumentation.ts file but with no luck as of yet (I get a 200 empty response)

harry-gocity commented 2 months ago

can you confirm that calling collectDefaultMetrics({}) outside of the API handler/inside the API handler file works as intended?

We've since upgraded to next 14 (still pages dir) and now have an instrumentation.ts that looks like this:

export async function register() {
    console.log('Configuring instrumentation');

    if (process.env.NEXT_RUNTIME === 'nodejs') {
        const prom = await import('prom-client');

        console.log('Configuring default prom metrics');
        prom.collectDefaultMetrics({});

        console.log('Creating custom prom metrics');
        new prom.Histogram({
            ...
        });
        new prom.Counter({
            ...
        });
        new prom.Counter({
            ...
        });
    }
}

We don't call collectDefaultMetrics anywhere else. And to access the metrics, we use register.getSingleMetric rather than a reference to the created metric directly.

IainMcHugh commented 2 months ago

Thanks @harry-gocity I was missing the instrumentationHook flag in the next config file. Two questions:

At the end of the register block I am able to call getSingleMetric() and it returns the correct value, however when I try to do the same thing in an API route I get undefined:

import { NextApiRequest, NextApiResponse } from 'next';
import { register } from 'prom-client';

const handler = async (_: NextApiRequest, res: NextApiResponse) => {
  const metric = register.getSingleMetric(
    'my_app_http_request_count',
  );
  console.log(metric); // returns undefined
  console.log(register.getMetricsAsArray()); // returns []
  res.send('Text');
};

export default handler;

Here is my exact instrumentation.ts file:

export async function register() {
  console.log('Configuring instrumentation');

  if (process.env.NEXT_RUNTIME === 'nodejs') {
    const prom = await import('prom-client');

    console.log('Configuring default prom metrics');
    prom.collectDefaultMetrics({
       prefix: 'my_app',
     });
    console.log('Creating custom prom metrics');

    new prom.Counter({
      name: 'my_app_http_request_count',
      help: 'Count of HTTP requests made to the app',
      labelNames: ['method'],
    });
    await prom.register.metrics(); // is this needed?
    console.log(prom.register.getMetricsAsArray()); // returns correct data
    console.log(prom.register.getSingleMetric('my_app_http_request_count')); // returnes correct data
  }
}
harry-gocity commented 2 months ago

Do you call await prom.register.metrics(); at the end of this register function? Do you have an API route where you expose all of the metrics to be scraped from?

We don't call prom.register.metrics() in instrumentation - that's the function call to make to get the actual metrics string. You should call it in an api route like this:

import { NextApiRequest, NextApiResponse } from 'next';
import { register } from 'prom-client';

const prometheus = async (_req: NextApiRequest, res: NextApiResponse) => {
    res.setHeader('Content-type', register.contentType);
    res.send(await register.metrics());
};

export default prometheus;
IainMcHugh commented 2 months ago

After a lot of debugging I finally found what was missing. For anyone who has a similar issue, in my next.config.js I needed both the instrumentationHook and:

  ...
  experimental: {
    instrumentationHook: true,
    serverComponentsExternalPackages: ['prom-client'],
  },
  ...

and then calling await register.metrics() works within an API handler.

Also in relation to the original issue, while you can't increment a counter directly within the middleware.ts file, you can call to an exposed API from the middleware (example /pages/api/metrics/counter.ts) and increment this way.

justenau commented 3 weeks ago

I'm having issues making this work on production build. I setup instrumentation as per this example https://github.com/siimon/prom-client/issues/584#issuecomment-2202375969 , in the api route called register.metrics() and added experimental next config. If I run locally (yarn dev) everything works as expected and I'm able to increase counter from another api route or to get all the metrics from the metrics endpoint. But as soon as I run the build, this stops working. In the logs I do see that instrumentation code runs and inside the instrumentation I'm able to get counter value or any other metric but once I call metrics route, I receive empty response. What could be causing this to behave differently in production build? Note: Using NextJS app router.

harry-gocity commented 3 weeks ago

FWIW I've somewhat started to give up with this lib - we're now using @vercel/otel and @opentelemetry packages instead, and scraping metrics with a prometheus exporter instead. The integration with Next.js is much cleaner, we have better tracing on GSSP (segmented by component load, fetch calls, and render) and we don't have to jump through bundling hoops to get a build passing. The only thing we still use this library for is the default metrics, the custom ones are all being migrated.

justenau commented 3 weeks ago

Just FYI I was able to resolve my issue, left more in-depth explanation here: https://github.com/siimon/prom-client/issues/640#issuecomment-2316946778

santosguilherme commented 2 weeks ago

@harry-gocity can you share some examples using @vercel/otel and @opentelemetry?