Closed zqjimlove closed 1 year ago
Hosting on Platform.sh here, still using pages dir and downgrading from 13.4 to 13.2.4 seems to have solved the issue for now 👌🏽
@cannontrodder is correct and the reduced workers can explain any noticed slowdown, please upgrade to 13.4.6
and see if this alleviates any issues noticed here!
I'm using pages dir and I still have the issue with 13.4.6
.
@daiyam what is the issue you seeing persisting in v13.4.6
?
@ijjk Yes, if /next/dist/compiled/jest-worker/processChild.js
are only created by next build
. (I've only tested on a new website in prod, so there is always next start
just after. I found out about the issue only yesterday, after the docker container was using 3GB of memory...)
this thread seems to be a mixed bag of people having issues during build time and/or runtime -- I can add my experience with the runtime https://github.com/vercel/next.js/issues/49623 (tl;dr add RAM)
We had 1gb in our pods. We upped to 2gb. This prevented the freak out the Friday before last where our pods just kept rebooting and scales to 10. They’d always peak on spin up, as the cache was cold. We are looking into sharing the cache between pods to help with that.
I did exactly that in our helm chart (https://github.com/icoretech/helm/blob/main/charts/airbroke/values.yaml#L22) but did not change a thing and also I'm not sure this should be done
@masterkain there's a thread here on this: https://github.com/vercel/next.js/discussions/23017#discussioncomment-5230940. There’s a flag isrMemoryCacheSize
you’ll need to set to zero for it to work it looks like.
Slightly worrying about it possible causing a race condition though - https://nextjs.org/docs/pages/building-your-application/data-fetching/incremental-static-regeneration#self-hosting-isr
There’s a flag
isrMemoryCacheSize
you’ll need to set to zero for it to work it looks like.Slightly worrying about it possible causing a race condition though - https://nextjs.org/docs/pages/building-your-application/data-fetching/incremental-static-regeneration#self-hosting-isr
interesting, thanks for that, I did think about the race condition but I was missing isrMemoryCacheSize
, time to experiment again, but I deviated a bit from the original topic so back to you people 👍
Next.js 13 using an app router (which is by default on since 13.4) is always using workers to run the app, see at https://github.com/vercel/next.js/blob/canary/packages/next/src/server/lib/start-server.ts#L182, while the main app (main thread) is acting as a proxy for the workers. I created an issue about this at https://github.com/vercel/next.js/issues/50586
At runtime, this is not related to any build process, as I see. Next.js is just using jest-worker
as a solution to start child processes. My assumption is that this is like that to speed up the new RSC rendering, as that is not optimal, see https://github.com/vercel/next.js/blob/canary/packages/next/src/server/app-render/use-flight-response.tsx
Like @lazarv , we likewise saw way better support when disabling appDir. Not definitive, so your YMMV, but I put notes in https://github.com/vercel/next.js/issues/49929#issuecomment-1602592624 and https://github.com/vercel/next.js/issues/51560#issuecomment-1599458889 for how the extra processes impact memory and crashes/timeouts we saw in production on 13.4.4+
All of my servers have been seeing TCP: out of memory -- consider tuning tcp_mem
in the kernel logs recently, causing Nginx connection reset issues when talking to the NextJS upstream, after digging into it there's a bunch of connections appearing from the NextJS process like so:
$ ip netns exec xxx ss -aemnpt
...
CLOSE-WAIT 142051 0 127.0.0.1:48390 127.0.0.1:33853 users:(("node",pid=14777,fd=85)) ino:88488113 sk:5a -->
skmem:(r208866,rb2358342,t0,tb2626560,f30,w0,o0,bl0,d0)
CLOSE-WAIT 174810 0 127.0.0.1:45838 127.0.0.1:33853 users:(("node",pid=14777,fd=50)) ino:88446849 sk:5c -->
skmem:(r183190,rb4978722,t0,tb2626560,f1130,w0,o0,bl0,d1)
....
All going out to port 33853, which turns up "jest-worker/processChild.js", which lead me to find this GitHub issue
$ ip netns exec xxx netstat -ltnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
...
tcp6 0 0 :::33853 :::* LISTEN 14927/node
$ ps aux | grep 14927
root 14927 0.5 0.4 32811872 135924 ? Sl 14:13 0:06 /nix/store/m00hsyaqpin3awwjyx0v7lxwzix73ibd-next-13.4.6-c2942e4eb7.zip/node_modules/next/dist/compiled/jest-worker/processChild.js
After a few hours there are thousands of connections in the CLOSE-WAIT state per NextJS process. Seeing this on NextJS 13.4.7 as well. Downgrading to 13.2.4 fixes the issue.
Confirming this issue (at runtime). We deploy in standalone mode and run our app with PM2 instances, but PM2 no longer reports accurate memory usage (making the max memory restart feature broken), and leaves jest-worker processes running even after killing the parent PM2 instances, causing constant OOM situations.
Confirming this issue (at runtime). We deploy in standalone mode and run our app with PM2 instances, but PM2 no longer reports accurate memory usage (making the max memory restart feature broken), and leaves jest-worker processes running even after killing the parent PM2 instances, causing constant OOM situations.
Is there another solution, my app also deploys in standalone mode and runs our app with PM2 instances, but not using appDir and next/images
, middleware as well.
Something, /dist/standalone/node_modules/.pnpm/next@13.4.5_@babel+core@7.21.8_react-dom@18.2.0_react@18.2.0/node_modules/next/dist/compiled/jest-worker/processChild.js
memory leak and not trigger system OOM killer, leads to system hanging up.
About atop monitor. It leak 1G memory costs about 5 seconds
Confirming this issue (at runtime). We deploy in standalone mode and run our app with PM2 instances, but PM2 no longer reports accurate memory usage (making the max memory restart feature broken), and leaves jest-worker processes running even after killing the parent PM2 instances, causing constant OOM situations.
@hnsr were you able to confirm that disabling appDir reduced your memory footprint on your project? I see that you referenced a PR here and I just gave it a shot on mine. I made sure to match your NextJS version (13.4.5) and still had a large bump in memory consumption upon first request 🤔
@uncvrd It seems that disabling appDir didn't work for us either to avoid the use of jest-worker when running the standalone server.js. I haven't had time to look into it further, but will give this another look and let you know if I find a workaround.
Confirming this issue (at runtime). We deploy in standalone mode and run our app with PM2 instances, but PM2 no longer reports accurate memory usage (making the max memory restart feature broken), and leaves jest-worker processes running even after killing the parent PM2 instances, causing constant OOM situations.
Is there another solution, my app also deploys in standalone mode and runs our app with PM2 instances, but not using appDir and
next/images
, middleware as well.Something,
/dist/standalone/node_modules/.pnpm/next@13.4.5_@babel+core@7.21.8_react-dom@18.2.0_react@18.2.0/node_modules/next/dist/compiled/jest-worker/processChild.js
memory leak and not trigger system OOM killer, leads to system hanging up.
Our hoster also noted this, that the whole server we run on went down, as opposed to the OOM killer being invoked to keep things under control. I wonder why that is 🤔
@uncvrd So it seems that the server.js
that is generated in standalone is simply written to always use workers:
As you can see the new version uses createServerHandler
from https://github.com/vercel/next.js/blob/canary/packages/next/src/server/lib/render-server-standalone.ts which always uses workers.
I am probably going to see if we can simply stick to an older version of nextjs for now
Yes, OOM killer not working. This is my server monitor, it increases IO, CPU, Memory suddenly.
@hnsr thanks for confirming on your end. That's really odd, I had some luck reverting to 13.3.2 today so I'll stick with that for now
The worker process seems to be introduced in this commit: Fix standalone mode with appDir running in a single proces
This is released in 13.4.0, the last release without the workers is 13.3.4
Why we are downgrading:
We had an issue where a process crashed, but the worker wasn't cleaned up. PM2 restarted nextjs several times a second, causing it to eat up 60GB of memory in a few seconds, crashing the server.
We're currently using pm2 to run the application, but pm2 is unable to report the used memory (which we are trying to use to automatically restart when it is running out).
Creating a worker process for the standalone mode seems somewhat odd in my opinion, according to the docs:
Additionally, a minimal server.js file is also output which can be used instead of next start.
Using separate worker processes isn't as minimal as it could be. Is there any way to get it 'flat' again @shuding?
Or maybe we can add a parameter to turn off workers in Standalone mode?
What would be the impact of turning it off? And why was it added?
The separate processes are needed to ensure app
and pages
routes are rendered separately as they require different versions of react. The workers also already monitor memory usage and restart when running out so pm2 shouldn't be needed here to achieve that.
What was the crash where the workers weren't cleaned up, sounds like that's more of the issue we should be addressing here.
@ijjk not sure if the crash you speak of, but I can create a crash even on the stock template app with workers here: https://github.com/vercel/next.js/issues/51560
Doesn’t repro without workers. (appDir false)
app and pages routes are rendered separately as they require different versions of react.
Didn't realize two versions of react is needed. Hopefully it will be one in the future then.
OK, can we set the usage of memory on the box? 90%might be too much or too low.
const MAXIMUM_HEAP_SIZE_ALLOWED =
(v8.getHeapStatistics().heap_size_limit / 1024 / 1024) * 0.9
Question - OR for us only using pages can we turn off the workers mode?
Also, if our process has NODE memory set with --max-old-space-size=8192
will v8.getHeapStatistics().heap_size_limit return right value?
Yes.
app.js
const maxHeapSz = require('v8').getHeapStatistics().heap_size_limit;
const maxHeapSz_GB = (maxHeapSz / 1024 ** 3).toFixed(1);
console.log(`${maxHeapSz_GB}GB`);
node --max-old-space-size=2048 app.js 2.0GB
The separate processes are needed to ensure
app
andpages
routes are rendered separately as they require different versions of react. The workers also already monitor memory usage and restart when running out so pm2 shouldn't be needed here to achieve that.What was the crash where the workers weren't cleaned up, sounds like that's more of the issue we should be addressing here.
@ijjk Yes makes sense that this is the thing that needs to be investigated and hopefully fixed. The crashes in our case were mainly startup errors, i.e. EADDRINUSE because the standalone server.js failed to bind on port 3000 at one point. Another earlier cause was a SynaxError (due to running the wrong node version). During next dev
this can happen as well if I make a typo, hot reloading will fail and it will crash, leaving a jest-worker running. Since I am a bad programmer, this can lead to my macbook going OOM 😅
For our production setup I would still like to have some control over memory usage through. The way PM2 allowed us to do this through --max-memory-restart
was ideal for us; is there any documentation on how we can accomplish this with the workers that next now uses?
@hnsr not sure if this can help, but this is how we make PM2 to kill processes when respawning :
kill_processChild.sh
#!/bin/bash
# Find the process IDs of all processes containing the string "processChild.js" in the command path
pids=$(pgrep -f "processChild.js")
# Iterate over each process ID and kill the corresponding process
for pid in $pids; do
echo "Killing process: $pid"
kill "$pid"
done
Pm2 ecosystem.config.js
module.exports = {
apps: [
{
name: 'main',
script: 'npm',
args: 'run app:start:force',
},
],
};
and in package.json i have this script
"app:start:force": "./kill_processChild.sh && cd apps/cms && npm run start",
In this way pm2 will kill orphan process childs before respawn the application.
Has anyone tried 13.4.8 yet?
No difference on performance so far for 13.4.7 -> 13.4.8 in standalone production mode for me.
Is it better or still running out of memory ?
Get Outlook for iOShttps://aka.ms/o0ukef
From: S-YOU @.> Sent: Monday, July 3, 2023 9:36:37 PM To: vercel/next.js @.> Cc: William Bell @.>; Mention @.> Subject: Re: [vercel/next.js] Hight number of processes of /next/dist/compiled/jest-worker/processChild.js still alive after next build (Issue #45508)
No difference on performance so far for 13.4.7 -> 13.4.8 in standalone production mode for me.
— Reply to this email directly, view it on GitHubhttps://github.com/vercel/next.js/issues/45508#issuecomment-1619422088, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGBTSQL52VNMTHAEGSCMDC3XOOFULANCNFSM6AAAAAAUOSW3IM. You are receiving this because you were mentioned.Message ID: @.***>
Is it better or still running out of memory ?
I believe the memory or cpu issue is because of high traffic, and my sites don't have enough traffic to reproduce the issue since the start and also I normally tune application periodically for general performance problems. I am here in this thread just because I am the one who informed 13.4.0 has two new jest-worker processes in the Discussion.
I can confirm the problem stil happens 13.4.8.
The weird part is that I cannot reproduce it. It looks just random to me.
// next.config.js
// next 13.4.7
experimental: {
appDir: false
}
Fortunately, since it was before using app route, I resolved the issue of excessive processChild.js processes by adding option in next.config.js
https://github.com/vercel/next.js/issues/49929#issuecomment-1602592624.
I hope the issue is resolved in the next version so that excessive process waste doesn't occur even without using such experimental options.
@constmoon Does this fix the issue on latest Next Versions? Or were you just adding that we should turn if off?
@billnbell I am using Next.js 13.4.7 and issue was resolved when I added that configuration in that version. I'm not sure if it applies to the latest version though.
I had this problem with free account on serv00.com server reduction of a number of processes helps to run next build.
My next.config.js
:
/** @type {import('next').NextConfig} */
const nextConfig = {
experimental: {
cpus: 1
}
}
module.exports = nextConfig
My version: next@13.4.9 Node v16.20.0
On 13.4.9 problem still exists (server is way more laggy, have to go back to 13.2.4)
I have a question. Do you start the next server in pm2
with the "exec_mode": "cluster"
configuration? Or as a single process?
I did it as single.
originally had jest-worker issue, downgraded to ### 13.2.3 as suggested above, jest-woker process is gone. However I am getting a different CPU spike from /.bin/next start as describe here https://github.com/vercel/next.js/discussions/49203
Since these issues are being confused, this particular issue is about processes being retained after the build exits. It does not refer to running in production and processes being spawned in that case, for production memory usage refer to this issue: #49929. On that issue I wrote down exactly what the 4 processes are: https://github.com/vercel/next.js/issues/49929#issuecomment-1637185156. Killing the processes randomly in production will cause your application to go down.
It's frustrating not to have the confidence to start a project, as every version that fixes one bug brings dozens of others. There could be a truly stable version.
still see the problem with 13.4.10 and still too many jest-worker processes
next v.13.2.4
next v.13.4.10
edit: after some experiments, as I see, experimental.appDir: false can solve the issue, but there is another bug because of it https://github.com/vercel/next.js/issues/52875
FYI, latest version (13.4.12) does not spawn jest-worker anymore, but new dedicated processes for each renderer (not sure it is just renamed or not).
next-router-worker
next-render-worker-pages
OK lets try it.
The processes rename happened on #52779, I wonder if the new releases fixes the high number of these processes
Is anyone else concerned that these workers all open on all tcp-interfaces instead of just localhost? Meaning that they are exposed to the internet on a standard-VM.
Is this a security risk? My next.js process is started using /usr/bin/npm run start -- --port=XYZ --hostname=127.0.0.1
, which works for the central service, but the workers just ignore this.
Is this a security risk?
Verify canary release
Provide environment information
Which area(s) of Next.js are affected? (leave empty if unsure)
CLI (create-next-app)
Link to the code that reproduces this issue
https://github.com/vercel/next.js/files/10565355/reproduce.zip
To Reproduce
reproduce.zip
This problem can reproduce above next@12.0.9, but 12.0.8 was all right.
Or remove
getInitialProps
in_app.tsx
was all right above next@12.0.9.Describe the Bug
Hight number of processes of /next/dist/compiled/jest-worker/processChild.js still alive after next build
Expected Behavior
Kill all child processes.
Which browser are you using? (if relevant)
No response
How are you deploying your application? (if relevant)
No response
NEXT-1348