vercel / next.js

The React Framework
https://nextjs.org
MIT License
126.84k stars 26.96k forks source link

Hight number of processes of /next/dist/compiled/jest-worker/processChild.js still alive after next build #45508

Closed zqjimlove closed 1 year ago

zqjimlove commented 1 year ago

Verify canary release

Provide environment information

Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 22.3.0: Thu Jan  5 20:48:54 PST 2023; root:xnu-8792.81.2~2/RELEASE_ARM64_T6000
Binaries:
  Node: 18.13.0
  npm: 8.19.3
  Yarn: 1.22.19
  pnpm: 7.26.2
Relevant packages:
  next: 12.0.9
  react: 17.0.2
  react-dom: 17.0.2

Which area(s) of Next.js are affected? (leave empty if unsure)

CLI (create-next-app)

Link to the code that reproduces this issue

https://github.com/vercel/next.js/files/10565355/reproduce.zip

To Reproduce

reproduce.zip

image

This problem can reproduce above next@12.0.9, but 12.0.8 was all right.

Or remove getInitialProps in _app.tsx was all right above next@12.0.9.

// GlobalApp.getInitialProps = async function getInitialProps(appContext) {
//   const appProps = await App.getInitialProps(appContext);

//   return {
//     ...appProps,
//   };
// };

Describe the Bug

Hight number of processes of /next/dist/compiled/jest-worker/processChild.js still alive after next build

Expected Behavior

Kill all child processes.

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

No response

NEXT-1348

sedlukha commented 1 year ago

13.4.12 - still high number of processes and bigger memory load

13.4.12 image

13.2.4 image

timneutkens commented 1 year ago

Just re-read every single post in this issue and there's a common topic of sharing screenshots. In order for us to investigate what your application is running into we need access to the code. Without this we can't verify what you're running into as you can understand.

I've posted an update around production memory usage on this issue: https://github.com/vercel/next.js/issues/49929#issuecomment-1649637524. At this point we are fairly certain there is no memory leak in production. We're still working on bringing down the number of processes to 2 instead of 4. We haven't investigated development yet as we've focused on slowdowns first in #48748.

@w7br you're talking about 13.2.4 and 13.1.6. Those versions are from months ago, there have been many optimizations landed since then. Would recommend using the latest version first. Either way please provide access to the application on which you're seeing memory issues so that we can confirm what you're seeing and investigate.

As we've shown in #49929 and #48748 we've dedicated significant engineering time towards investigating and improving these, however, the only way we can do this for memory usage issues is by having access to your code and running it ourselves.

As you can see in my previous updates on #49929 we had to run the lowest level tools dumping v8 memory allocations to investigate these. For slowdowns luckily we don't need access because you can share a CPU profile, for investigating memory allocation a heap dump is not enough.

Also please make it clear what you're running. I.e. @sedlukha is that development? I guess so?

hnsr commented 1 year ago

Hey @timneutkens, I appreciate all the work in debugging the memory issues.

As you indicated in https://github.com/vercel/next.js/issues/45508#issuecomment-1637831723 this particular issue is about processes being retained after a build exits, as opposed to memory usage issues.

We are currently running into this problem also when next dev crashes, or when running in production in standalone mode and the main server.js process crashes (which only happens during some rare issues during startup, like EADDRINUSE, so is probably less relevant).

Do you still need a reproduction case for this lingering worker process problem? I will be happy to see if I can provide one.

hanoii commented 1 year ago

@timneutkens I am on the same boat as @hnsr, anytime the next build fails for any reason, starting the server (even in the command line with npm start fails with something like: Error: Could not find a production build in the '/app/.next' directory. (In my case the build is now currently failing because of #53086) but it failed for many other reasons before as this is part of a deploy process.

To make this worst, I am using pm2 that tries to restart the app frequently so it quickly ends up with a lot of worker processes and they are doing something, using memory and everything. Is like the worker thread lose its parent process so eventhough npm start finishes with error the process sticks.

timneutkens commented 1 year ago

Do you still need a reproduction case for this lingering worker process problem? I will be happy to see if I can provide one.

This would be helpful indeed if you're able to provide that, saves me significant amounts of time trying to figure out how this is being run into. I guess the "next start without a build" case @hanoii is talking about is a good start if that reproduces though 👍

timneutkens commented 1 year ago

I just checked with @ijjk and as it turns out he saw something similar and fixed it in a recent refactor: https://github.com/vercel/next.js/blob/46677ccda6a62203d7a7ae359c1020780aeccee5/packages/next/src/server/lib/router-server.ts#L247-L262. Could you try with next@canary?

hanoii commented 1 year ago

@timneutkens I tried it locally as I was able to reproduce it as well and yes, next@canary at least doesn't leave the process in a straight out start fail:

I am getting a different error:

[Error: ENOENT: no such file or directory, open '/var/www/html/next/.next/BUILD_ID'] {

but I guess that's ok.

Maybe this fixes it.

sedlukha commented 1 year ago

@timneutkens

Also please make it clear what you're running. I.e. @sedlukha is that development? I guess so?

No, this is prod. I run it for 17 apps.

And i've tried canary, now even worse memory usage, 4.9G (13.4.13-canary.6) vs 2.4G (v13.2.4) vs 3.16G (v13.4.12)

image

sedlukha commented 1 year ago

@timneutkens seems that experimental.appDir: false might disable next-render-worker-app process and solve the problem for those, who use only pages routing.

I would be happy to test it, but I can't do it on my real apps because of next issue https://github.com/vercel/next.js/issues/52875

timneutkens commented 1 year ago

@sedlukha Seems what you're reporting is exactly the same as #49929 in which I've already explained the memory usage, there is no leak, it's just using multiple processes and we're working on optimizing that: https://github.com/vercel/next.js/issues/49929#issuecomment-1637185156

Setting appDir: false is not supported and that option will go away in a future version, we just haven't gotten around to removing the feature flag.


@hanoii thanks for checking 👍

Nirmal1992 commented 1 year ago

Same here.. my macbook crashed when I used Nextjs latest with turbo repo.. multiple child processes were running in the background even after terminating the server...

S-YOU commented 1 year ago

FYI: experimental: {appDir: false} does not work anymore on 13.4.13 for me (page rendered, but url changes failed to load json and triggering ssr), and now spawning 3 processes apart from main process.

space1worm commented 1 year ago

I have same issue as well. version 13.4.8

timneutkens commented 1 year ago

@Nirmal1992 @S-YOU @space1worm I'm surprised you did not read my previous comment. I thought it was clear that these types of comments are not constructive? https://github.com/vercel/next.js/issues/45508#issuecomment-1653226340

@space1worm I'm even more surprised you're posting "same issue" without trying the latest version of Next.js...

space1worm commented 1 year ago

@timneutkens Hey, yeah sorry I missed it, here I made my test repo public.

You can check this commit tracer

I had memory usage problem on version 13.4.8, after navigating on any page my pod's memory would skyrocket for some reason... and after that whole app was braking an becoming unresponsive.

not sure, if this problem is related to my codebase or not, would love to hear what is the problem!

one more thing, I tried to increase resources but the application was still unresponsive after breaking.

Here as a reference

Screenshot 2023-08-09 at 20 38 59 Screenshot 2023-08-09 at 20 50 44
timneutkens commented 1 year ago

Application is still not using the latest version of Next.js, same in the commit linked: https://gitlab.cern.ch/nzurashv/tracer/-/blob/master/package-lock.json#L4673

space1worm commented 1 year ago

@timneutkens I have updated to latest version, created new branch tracer/test

Issue still persist

here you can check this link as well

tracer-test.web.cern.ch

Screenshot 2023-08-09 at 21 17 07 Screenshot 2023-08-09 at 21 17 14

Additionally, I inquired with the support team regarding the cause of the failure, and they provided me with the following explanation.

Screenshot 2023-08-09 at 21 22 16
glaustino commented 1 year ago

FYI: experimental: {appDir: false} does not work anymore on 13.4.13 for me (page rendered, but url changes failed to load json and triggering ssr), and now spawning 3 processes apart from main process.

  • next-router-worker
  • next-render-worker-app
  • next-render-worker-pages

I have a question about these child processes, currently it seems they open random ports which broke my application behind WAF in Azure, this happened because we only open certain ports. Is there anyway for me to force the ports these child processes are going to use at all? I am on the latest next release

jrscholey commented 1 year ago

FYI: With 13.4.11 we were unable to start our app in Kubernetes. We received a spawn process E2BIG at jest-worker. This only happened when our rewrites (regex path matching) were above a certain length (although still below max).

Downgrading back to 13.2.4 resolved the issue.

S-YOU commented 1 year ago

FYI: now main process started with node server.js is gone in Next.js 13.4.15, and next-router-worker's parent PID become 1 (init). This could probably use less memory since It use one less process.

1362416       1      00:00:02 next-router-worker
1362432 1362416      00:00:00 next-render-worker-app
1362433 1362416      00:00:05 next-render-worker-pages
S-YOU commented 1 year ago

@timneutkens, sorry, I probably misread it. I do not mean to claim or anything. I am just sharing what I've observed in the version I am using (which supposed to be latest release).

timneutkens commented 1 year ago

In 13.4.15 (but really upgrade to 13.4.16 instead) this PR has landed to remove one of the processes indeed: https://github.com/vercel/next.js/pull/53523

timneutkens commented 1 year ago

@sladg is this a joke...?

sedlukha commented 1 year ago

@timneutkens I've tried v13.4.20-canary.2.

It was expected that https://github.com/vercel/next.js/pull/53523 and https://github.com/vercel/next.js/pull/54143 would reduce the number of processes, resulting in lower memory usage.

Yes, the number of processes has been reduced; after the update, I see only two processes. However, memory usage is still higher than it was with v.13.2.4.

node v.16.18.1 (if it matters)

v13.4.20-canary.2 image

13.2.4 image

timneutkens commented 1 year ago

It's entirely unclear what you're running / filtering by, e.g. you're filtering by next- but 13.2.4 doesn't set process.title to anything specific.

Sharing screenshots is really not useful, I keep having to repeat that in every single comment around these memory issues.

Please share code, I can't do anything to help you otherwise.

billnbell commented 1 year ago

@timneutkens I've tried v13.4.20-canary.2.

It was expected that #53523 and #54143 would reduce the number of processes, resulting in lower memory usage.

Yes, the number of processes has been reduced; after the update, I see only two processes. However, memory usage is still higher than it was with v.13.2.4.

node v.16.18.1 (if it matters)

v13.4.20-canary.2 image

13.2.4 image

The memory looks stable - but it is really hard to see anything in the screen shots.

magalhas commented 1 year ago

I'm seeing this behaviour running next dev starting on 13.3 and newer versions (13.4 included). This isn't happening on 13.2. Somehow this looks like it's happening whenever files are being added/removed from the FS (not sure due to my current use case) while the dev script is running.

Even after closing next dev, jest orphaned processes are leftover.

timneutkens commented 1 year ago

I'm amazed by how often my comments are flat out ignored the past few weeks on various issues. We won't be able to investigate/help based on comments saying the equivalent of "It's happening". Please share code, I can't do anything to help you otherwise.

I'll have to close this issue when there is one more comment without a reproduction as I've checked multiple times now and the processes are cleaned up correctly in the latest version.

magalhas commented 1 year ago

By latest version you mean the latest RC @timneutkens ? Sorry can't help with steps to reproduce, this is happening inside a spawn call in a very specific use case so best I can do is confirm that it happens.