Closed asterikx closed 4 years ago
@asterikx Could you share the code for the next page you're seeing the high cold start time? i.e. Are you using getServerSideProps, getStaticProps etc. that may be adding time during server side rendering?
@danielcondemarin sure, I'm using next-i18next, that needs getInitialProps
:
import * as React from 'react';
import { ChakraProvider, CSSReset, Global } from '@chakra-ui/core';
import { track } from 'react-tracking';
import type { AppProps, AppContext } from 'next/app';
import { RecoilRoot } from 'recoil';
import { theme } from '../theme';
import { appWithTranslation } from '../i18n';
import { styles } from '../styles/styles';
import { UserProvider } from '../contexts/user';
import { configureAuth } from '../utils/auth';
import { HeaderProvider } from '../contexts/header';
import { MDXProvider } from '../components/MDXProvider';
import { useTrackPages } from '../hooks/useTrackPages';
import App from 'next/app';
configureAuth();
function MyApp({ Component, pageProps }: AppProps) {
useTrackPages();
return (
<ChakraProvider theme={theme}>
<CSSReset />
<Global styles={styles} />
{/*<Suspense fallback={<SplashScreen />}>*/}
<MDXProvider>
<RecoilRoot>
<UserProvider>
<HeaderProvider>
<Component {...pageProps} />
</HeaderProvider>
</UserProvider>
</RecoilRoot>
</MDXProvider>
{/*</Suspense>*/}
</ChakraProvider>
);
}
// https://github.com/isaachinman/next-i18next/issues/615#issuecomment-575578375
MyApp.getInitialProps = async (appContext: AppContext) => {
const appProps = await App.getInitialProps(appContext);
return { ...appProps };
};
export default track({})(appWithTranslation(MyApp));
I'm seeing an error message in the browser console:
[Error] Failed to load resource: the server responded with a status of 404 () (common.json, line 0) https://staging.detelling.com/static/locales/de/common.json
But I don't think this affects server rendering. Cloudwatch does not show any error logs. The index page makes no further calls to getInitialProps
, getServerSideProps
, or getStaticProps
.
I wonder if the call to https://staging.detelling.com/static/locales/de/common.json
is adding to your response times.
To be certain what's going on is best to measure time it takes @sls-next/lambda-at-edge
in different key steps:
Here is a rough guide of how to put some performance measurements in the default handler. This is something I'd like to introduce maybe behind a boolean flag. Feel free to PR if you can,
import {
performance
} from 'perf_hooks';
const overallStartTime = performance.now(); // line 115
...
const pageRequireCallStartTime = performance.now(); // line 156
const page = require(`./${pagePath}`);
const pageRequireCallEndTime = performance.now();
console.log("Next page require time (ms) - ", pageRequireCallEndTime - pageRequireCallStartTime);
...
return responsePromise.then(response => { // line 161
const overallEndTime = performance.now();
console.log("SSR + compat layer time (ms) - ", overallEndTime - pageRequireCallEndTime);
console.log("Overall time (ms) - ", overallEndTime - overallStartTime);
return response;
})
One more thing how much Memory do you have allocated to the functions? This has an impact on node require
calls.
One more thing how much Memory do you have allocated to the functions? This has an impact on node
require
calls.
512 MB. CloudWatch reports 176 MB used.
I'm wondering whether serverless trace mode
could cause the high cold start times (my node_modules
has 29 MB and ~3500 files total)? Will the require
call only add to the execution time on the first handler invocation or on each invocation?
Here is a rough guide of how to put some performance measurements in the default handler. This is something I'd like to introduce maybe behind a boolean flag.
Where would you like to put a logExecutionTimes
flag? Lambda@Edge does not support environment variables, so we must decide at build time whether to perform time measurements or not. Do we need to introduce a factory for creating different handlers based on the flag passed to the Builder? Can you think of a simpler approach?
512 MB. CloudWatch reports 176 MB used.
That's not the issue then
I'm wondering whether
serverless trace mode
could cause the high cold start times (mynode_modules
has 29 MB and ~3500 files total)? Will therequire
call only add to the execution time on the first handler invocation or on each invocation?
This could be the issue indeed and we can prove it by adding the performance times. serverless trace mode
doesn't bundle everything into one big file, which the previous serverless
target did. There are claims that bundling everything into one file reduces nodejs bootup time by doing less require's but tbh I've not looked into this thoroughly.
There are other optimisations that could be done in the serverless trace target but let's confirm that's the issue first.
Are you seeing the problem if using the old serverless
target ?
Here is a rough guide of how to put some performance measurements in the default handler. This is something I'd like to introduce maybe behind a boolean flag.
Where would you like to put a
logExecutionTimes
flag? Lambda@Edge does not support environment variables, so we must decide at build time whether to perform time measurements or not. Do we need to introduce a factory for creating different handlers based on the flag passed to the Builder? Can you think of a simpler approach?
You're right it will need to be done at build time. I think we can keep the same handler but create a nice abstraction for the logger. Something like,
const perfLogger = (enablePerfLogger) => {
if (enablePerfLogger) {
const performance = require('perf_hooks').performance;
return { now: () => performance.now(), log: (t2, t1) => console.log("Time (ms) ", t2 - t1); }
}
return { now: () => {}, log: (t2, t1) => {} }
}
Are you seeing the problem if using the old serverless target?
Can't use that unfortunately due to the hard 50 MB deployment package limit. I have ~20 routes and each JS file had ~12-15 MB using the serverless
target. Using serverless trace
the JS files are down to ~200kB each.
I think we can keep the same handler but create a nice abstraction for the logger. Something like
Currently, the default handler is a static file copied to the default-lambda
dir. Would you add an additional .env
or config.json
file to the default-lambda
dir, so that the handler can read the enablePerfLogger
value from there at runtime?
Thanks for your help and feedback so far. I'll try to create a PR tomorrow.
Currently, the default handler is a static file copied to the
default-lambda
dir. Would you add an additional.env
orconfig.json
file to thedefault-lambda
dir, so that the handler can read theenablePerfLogger
value from there at runtime?
Instead of enablePerfLogger
let's call it what you suggested earlier as is more explicit logLambdaExecutionTimes
.
You can pass the logLambdaExecutionTimes as another build option.
The boolean flag can be added to the default handler manifest and apiBuildManifest which is available at runtime since is copied to the lambda artefacts.
Once the above is done, we can have a new serverless.yml input logLambdaExecutionTimes
that can be passed to the Builder.
Thanks for your help and feedback so far. I'll try to create a PR tomorrow.
No worries and thank you for PR'ing 🙏
I'm also seeing the same issue. With serverless-trace
target (Lambda code size ~8 MB), cold starts seem to be ~5 seconds or more. With the regular serverless
target (Lambda code size ~26 MB), cold starts seem to be ~2 seconds. I do have a bunch of dynamic routes (~15) which is adding to the code size. Even warm invocations seem slightly higher (I think P50 went up from ~100 ms to ~150 ms, though this includes network times).
I'll try to post a repro and more detailed perf metrics when I can.
Alright, I've created a repro here.
Repro link: https://github.com/dphang/nextjs-repros.
Some notes:
serverless.yml
since I currently use that bucket.K6_BASE_URL=https://xxx.cloudfront.net k6 run perf/perf-test.js --vus 50 --duration 30s
(Note: this specific script is running 50 vus (virtual users) that will hit the endpoint every 2-5 seconds for 30 seconds total. Total of ~300-400 iterations within 30 seconds. Feel free to tune this but this at least would invoke a bunch of cold starts due to simultaneous requests). Also, I redacted the cloudfront URL since that's my own AWS account, feel free to use your own)
With serverless
target (Lambda zip ~1.4 MB, index.js
page was ~2.9 MB uncompressed), got these results:
✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 448
checks.....................: 100.00% ✓ 898 ✗ 0
data_received..............: 1.3 MB 44 kB/s
data_sent..................: 64 kB 2.1 kB/s
http_req_blocked...........: avg=6.13ms min=0s med=0s max=224.2ms p(90)=47.02ms p(95)=50.78ms
http_req_connecting........: avg=1.6ms min=0s med=0s max=16.97ms p(90)=12.69ms p(95)=14.18ms
✓ http_req_duration..........: avg=55ms min=27.18ms med=35.9ms max=701.61ms p(90)=141.08ms p(95)=162.49ms
http_req_receiving.........: avg=92.86µs min=31µs med=72µs max=2.76ms p(90)=108.19µs p(95)=127µs
http_req_sending...........: avg=60.93µs min=20µs med=52µs max=1.15ms p(90)=86µs p(95)=100µs
http_req_tls_handshaking...: avg=4.42ms min=0s med=0s max=183.36ms p(90)=33.3ms p(95)=35.58ms
http_req_waiting...........: avg=54.85ms min=27.07ms med=35.76ms max=701.44ms p(90)=140.94ms p(95)=162.38ms
http_reqs..................: 449 14.966646/s
iteration_duration.........: avg=3.52s min=269.52ms med=3.46s max=5.17s p(90)=4.72s p(95)=4.92s
iterations.................: 399 13.299982/s
vus........................: 50 min=50 max=50
vus_max....................: 50 min=50 max=50
I looked at Lambda logs and the Lambda execution time was generally around 350-400 ms and 150 ms of that was initialization time (this was manual process, not sure of an easy way to extract percentile data from the CloudWatch Lambda logs. If someone knows a good way, please let me know :). So I guess the difference is network latency.
With serverless-trace
target (interestingly, the Lambda zip is higher here at 3.7 MB compressed which is not what I would expect. node_modules
is taking ~22 MB uncompressed and the index.js
page is ~100 kB.), I got these results:
✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 426
checks.....................: 100.00% ✓ 854 ✗ 0
data_received..............: 1.3 MB 42 kB/s
data_sent..................: 63 kB 2.1 kB/s
http_req_blocked...........: avg=6.87ms min=0s med=0s max=162.3ms p(90)=53.37ms p(95)=54.91ms
http_req_connecting........: avg=1.95ms min=0s med=0s max=19.62ms p(90)=15ms p(95)=16.57ms
✓ http_req_duration..........: avg=187.7ms min=27.32ms med=37.11ms max=1.5s p(90)=1.31s p(95)=1.38s
http_req_receiving.........: avg=105.9µs min=40µs med=74µs max=2.05ms p(90)=103.4µs p(95)=128.69µs
http_req_sending...........: avg=57.1µs min=21µs med=48µs max=1.17ms p(90)=82.4µs p(95)=109.69µs
http_req_tls_handshaking...: avg=4.86ms min=0s med=0s max=147.46ms p(90)=35.44ms p(95)=39.04ms
http_req_waiting...........: avg=187.54ms min=27.23ms med=36.99ms max=1.5s p(90)=1.31s p(95)=1.38s
http_reqs..................: 427 14.233232/s
iteration_duration.........: avg=3.71s min=1.66s med=3.8s max=6.02s p(90)=4.86s p(95)=5.03s
iterations.................: 377 12.566577/s
vus........................: 50 min=50 max=50
vus_max....................: 50 min=50 max=50
Similarly in Lambda logs I was getting 1000 ms durations with init durations of ~150 ms.
Ran each of the k6 tests a few times just to be sure and it does seem that indeed serverless-trace
target has higher times on cold start (1.3-1.5 sec vs. 700 ms request time, 1 sec vs. 350 ms Lambda execution time). But Lambda init was about the same at 150 ms, so I don't think the Lambda zip size is affecting it much.
I'm not a JS expert (I normally work in Python/Java) but I suspect:
node_modules
seems too large and can be reduced further, and also could it be bundled into 1 file?index.js
file is minimal (100 kB) and it needs to do a bunch of requires
, which could add to JS execution time.GitHubTest repo to repro Cypress issue with waiting for alias routes - dphang/nextjs-repros
It's indeed the require
call that makes up for most of the high response time on the first invocation.
1st invocation:
handler execution time: 7203.740994999884 (ms)
(code)
require JS execution time: 6337.293548000045 (ms)
(code)
page render execution time: 0.07000900013372302 (ms)
(code)
2nd invocation:
handler execution time: 348.02547800005414 (ms)
require JS execution time: 0.011753000086173415 (ms)
page render execution time: 0.008907000068575144 (ms)
3rd invocation:
handler execution time: 545.747255000053 (ms)
require JS execution time: 0.009914000052958727 (ms)
page render execution time: 0.009519999846816063 (ms)
Also, there is a significant overhead of 350-850 ms that is not caused by requiring JS files or rendering... I hope we can get these numbers down as well. I was really hoping for response times well below 100 ms. Otherwise, I might have to spin up a Fargate container :/
GitHub⚡ Deploy your next apps using the serverless framework - asterikx/serverless-next.js
EDIT: I have added some more logging (here). ~The remaining overhead of 350-850 ms comes from the lambda compat call (it amounts to 99,95% of the execution time for subsequent invocations):~
EDIT 2: The call to page.render
makes up for ~99% of the total time. The compat layer takes ~1ms.
@danielcondemarin do you see any chance of bringing these execution times down?
I have come across the ncc
package from Vercel for compiling a NodeJS module into a single JS file, but I'm not sure if it can be used for this issue.
I had a quick look at the next-aws-cloudfront.js
file, but I don't see where the high execution times might come from.
It's indeed the
require
call that makes up for most of the high response time on the first invocation.
require JS execution time: 6337.293548000045 (ms)
(code)
6.3 seconds is very high indeed and not something I've seen before during require
time using the serverless trace target. Also seems way off from @dphang 's benchmarks.
Could you try one more thing, spin up the nextjs local server using production mode,
$ next build && next start
And test how long does the first request takes (Ideally this would be tested constraining memory resources, e.g. using Docker to match the lambda runtime as close as possible)
I suspect one or more of your dependencies is responsible for this and not necessarily the amount of require calls.
I would start by removing global HoCs in your _app and testing how that affects performance, that is getting rid of react-tracking
and appWithTranslation
so your App export ends up being export default MyApp
.
page render execution time: 0.07000900013372302 (ms)
(code)
Your page.render
logs are not accurate. You need to wait for getInitialProps / SSR promise to fulfil,
const tBeforeSsrRender = now();
page.render(req, res);
return responsePromise.then(response => {
const tAfterSsrRender = now();
log("page render execution time", tBeforeSsrRender, tAfterSsrRender);
return response;
})
@danielcondemarin do you see any chance of bringing these execution times down?
I can't see an obvious way right now tbh. The serverless-trace target is the same that Vercel uses so I'd be interested seeing how it compares. One optimisation Vercel does is compressing each file in the dependencies but I don't see how that makes much difference at runtime initialisation.
I have come across the
ncc
package from Vercel for compiling a NodeJS module into a single JS file, but I'm not sure if it can be used for this issue.
Using ncc
is the equivalent of the serverless
target and would lead to having one bundle per page with loads of duplicated dependencies code. That would mean going back to square one in terms of lambda artefact size limit.
I had a quick look at the
next-aws-cloudfront.js
file, but I don't see where the high execution times might come from.
I've deliberately tried to keep the compat layer as lightweight as possible. One possible optimisation is to make gzipping async here using its async version
@danielcondemarin I did more research and came across https://fab.dev/ which apparently bundles the server JS into a single file for all pages. It supports NextJs. Not sure about the perf, I can try it out today. Maybe this or something similar can be used?
Frontend Application BundlesEmpower your NuxtJS application with @nuxt/content module: write in a content/ directory and fetch your Markdown, JSON, YAML and CSV files through a MongoDB like API, acting as Git-based Headless CMS.
@danielcondemarin I did more research and came across https://fab.dev/ which apparently bundles the server JS into a single file for all pages. It supports NextJs. Not sure about the perf, I can try it out today. Maybe this or something similar can be used?
AFAIK Fab uses the Next.js serverless
target behind the scenes, which would have the same issues around hitting the Lambda artefact size.
I think it would be more accurate to compare against Vercel which uses the same serverless-trace
target.
@danielcondemarin I did more research and came across https://fab.dev/ which apparently bundles the server JS into a single file for all pages. It supports NextJs. Not sure about the perf, I can try it out today. Maybe this or something similar can be used?
AFAIK Fab uses the Next.js
serverless
target behind the scenes, which would have the same issues around hitting the Lambda artefact size.
Ah, let me try it out on both my test project and real project, I think I read somewhere the package does some other optimizations on top of the serverless target, since they target Cloudflare as well (which has a very small 1 MB compressed script limit). Though not sure how the performance is like (code size certainly doesn't noticeably affect cold starts). Will report back with my findings.
Have tested with FAB on Lambda@Edge with the nextjs-repros
repository I posted. Here are the results. Lambda memory is 1024 MB and Node.js 12.x
FAB (1 SSR index page, code size = 1.4 MB):
✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 451
checks.....................: 100.00% ✓ 904 ✗ 0
data_received..............: 1.3 MB 44 kB/s
data_sent..................: 65 kB 2.2 kB/s
http_req_blocked...........: avg=6.89ms min=0s med=0s max=177.27ms p(90)=56.93ms p(95)=59.44ms
http_req_connecting........: avg=1.75ms min=0s med=0s max=17.78ms p(90)=13.51ms p(95)=16.38ms
✓ http_req_duration..........: avg=46.02ms min=26.23ms med=34.97ms max=624.91ms p(90)=72.93ms p(95)=87.68ms
http_req_receiving.........: avg=121.23µs min=37µs med=78µs max=4.85ms p(90)=118.9µs p(95)=136.45µs
http_req_sending...........: avg=64.39µs min=23µs med=57.5µs max=1.18ms p(90)=82.9µs p(95)=92µs
http_req_tls_handshaking...: avg=5.1ms min=0s med=0s max=160.83ms p(90)=40.46ms p(95)=43.35ms
http_req_waiting...........: avg=45.83ms min=26.13ms med=34.8ms max=624.79ms p(90)=72.78ms p(95)=87.54ms
http_reqs..................: 452 15.066646/s
iteration_duration.........: avg=3.45s min=218.67ms med=3.42s max=5.06s p(90)=4.68s p(95)=4.82s
iterations.................: 402 13.399982/s
vus........................: 50 min=50 max=50
vus_max....................: 50 min=50 max=50
It doesn't seem much different, perf seems in line with serverless-next.js
with serverless
target as I posted earlier, maybe slightly better at P90/P95. But noticed that most cold starts have higher init duration (~450 ms) but lower invocation duration (~75 ms) when looking. I think it's probably optimized that way since init is not billed in Lambda, and apparently init has higher CPU allocated to you?
But this is a simple case, if I added a few more duplicate pages to my nextjs-repros
repo, here are the comparisons for 3 SSR pages:
serverless-next.js serverless target (3 SSR pages, code size = 3.2 MB):
✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 439
checks.....................: 100.00% ✓ 880 ✗ 0
data_received..............: 1.4 MB 46 kB/s
data_sent..................: 64 kB 2.1 kB/s
http_req_blocked...........: avg=6ms min=0s med=0s max=149.3ms p(90)=44.1ms p(95)=48.35ms
http_req_connecting........: avg=1.71ms min=0s med=0s max=16.79ms p(90)=13.28ms p(95)=15.2ms
✓ http_req_duration..........: avg=108.36ms min=27.86ms med=35.05ms max=792.11ms p(90)=657.13ms p(95)=702.3ms
http_req_receiving.........: avg=152.17µs min=34µs med=62µs max=5.02ms p(90)=166.2µs p(95)=638.04µs
http_req_sending...........: avg=49.1µs min=18µs med=44µs max=1.19ms p(90)=66µs p(95)=91µs
http_req_tls_handshaking...: avg=4.23ms min=0s med=0s max=132.47ms p(90)=29.67ms p(95)=32.85ms
http_req_waiting...........: avg=108.16ms min=26.4ms med=34.91ms max=791.96ms p(90)=656.95ms p(95)=702.13ms
http_reqs..................: 440 14.666662/s
iteration_duration.........: avg=3.56s min=832.32ms med=3.51s max=5.73s p(90)=4.82s p(95)=4.9s
iterations.................: 389 12.966663/s
vus........................: 50 min=50 max=50
vus_max....................: 50 min=50 max=50
serverless-next.js serverless-trace target (3 SSR pages, code size = 3.7 MB)
✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 433
checks.....................: 100.00% ✓ 868 ✗ 0
data_received..............: 1.4 MB 46 kB/s
data_sent..................: 63 kB 2.1 kB/s
http_req_blocked...........: avg=7.69ms min=0s med=0s max=248.7ms p(90)=61.02ms p(95)=61.21ms
http_req_connecting........: avg=2.42ms min=0s med=0s max=23.92ms p(90)=17.52ms p(95)=20.65ms
✓ http_req_duration..........: avg=180.23ms min=26.66ms med=34.22ms max=1.48s p(90)=1.27s p(95)=1.38s
http_req_receiving.........: avg=125.17µs min=32µs med=62µs max=2.13ms p(90)=116.7µs p(95)=624.69µs
http_req_sending...........: avg=50.51µs min=17µs med=46µs max=790µs p(90)=69µs p(95)=87.34µs
http_req_tls_handshaking...: avg=5.02ms min=0s med=0s max=143.33ms p(90)=37.26ms p(95)=41.21ms
http_req_waiting...........: avg=180.06ms min=25.97ms med=34.07ms max=1.48s p(90)=1.27s p(95)=1.38s
http_reqs..................: 434 14.466619/s
iteration_duration.........: avg=3.71s min=1.72s med=3.69s max=6.36s p(90)=4.89s p(95)=5.15s
iterations.................: 383 12.766624/s
vus........................: 50 min=50 max=50
vus_max....................: 50 min=50 max=50
FAB (3 SSR pages, code size = 1.4 MB)
✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 443
checks.....................: 100.00% ✓ 888 ✗ 0
data_received..............: 1.4 MB 47 kB/s
data_sent..................: 64 kB 2.1 kB/s
http_req_blocked...........: avg=6.62ms min=0s med=0s max=147.02ms p(90)=52.79ms p(95)=57.11ms
http_req_connecting........: avg=1.94ms min=0s med=0s max=29.99ms p(90)=15.67ms p(95)=17.92ms
✓ http_req_duration..........: avg=108.43ms min=25.46ms med=34.83ms max=768.4ms p(90)=649.89ms p(95)=690.86ms
http_req_receiving.........: avg=119.45µs min=34µs med=67µs max=1.93ms p(90)=126.69µs p(95)=530.44µs
http_req_sending...........: avg=65.27µs min=19µs med=48µs max=3.24ms p(90)=86.39µs p(95)=118.09µs
http_req_tls_handshaking...: avg=4.64ms min=0s med=0s max=130.37ms p(90)=36.88ms p(95)=38.95ms
http_req_waiting...........: avg=108.25ms min=25.38ms med=34.62ms max=768.21ms p(90)=649.76ms p(95)=690.71ms
http_reqs..................: 444 14.799903/s
iteration_duration.........: avg=3.56s min=911.67ms med=3.47s max=5.62s p(90)=4.8s p(95)=4.93s
iterations.................: 393 13.099914/s
vus........................: 50 min=50 max=50
vus_max....................: 50 min=50 max=50
In fact, on my real app, fab seems slightly better in cold starts but median time is actually higher. I ran all benchmarks again. Comparisons below (50 vus / 30 sec duration perf benchmark, 1 GB Lambda, Node 12.x runtime):
serverless target (12 SSG pages, 6 SSR pages, code size = ~31.8 MB):
http_req_duration..........: avg=352.31ms min=63.44ms med=96.84ms max=3.39s p(90)=2.04s p(95)=2.19s
http_reqs..................: 417 13.899881/s
serverless-trace target (12 SSG pages, 6 SSR pages, code size = ~9.5 MB):
http_req_duration..........: avg=870.97ms min=58.88ms med=135.72ms max=5.61s p(90)=5.39s p(95)=5.51s
http_reqs..................: 367 12.233313/s
FAB (12 SSG pages, 6 SSR pages, code size = ~9.5 MB):
http_req_duration..........: avg=375.88ms min=100.89ms med=135.63ms max=2.16s p(90)=1.94s p(95)=2.02s
http_reqs..................: 406 13.533244/s
The most surprising is that FAB reduced my Lambda code size to ~2.7 MB.
Based on this, I conclude that:
server.js
file from the Next.js build outputs from serverless target
. I don't believe code size impacting cold starts much (maybe a few milliseconds), but it's also good to do to avoid hitting Lambda limits.serverless
target in nextjs-repros
. But for my real app, it seems to have improved slightly from ~3 seconds (serverless
target) to ~2 seconds, but at the cost of increasing the average and median response times slightly. Although my app is probably not the most optimized at this time. We would also have to look into the FAB code to better understand it (I haven't looked into it much).Edit: saw your updated comment. I'll benchmark against Vercel later (need to setup a new account due to an issue with my existing one).
Your
page.render
logs are not accurate. You need to wait for getInitialProps / SSR promise to fulfill,
Arg, the render call is async of course. My page.render
call is the one that takes 350-900 ms. The compat layer call takes 1-2 ms. I have to investigate the render call on my end, that's definitely not acceptable.
I suspect one or more of your dependencies is responsible for this and not necessarily the amount of require calls. I would start by removing global HoCs in your
_app
and testing how that affects performance, that is getting rid ofreact-tracking
andappWithTranslation
so your App export ends up beingexport default MyApp
.
I tried that. It doesn't make a noticeable difference, neither to the require nor to the render call.
EDIT: I built an run my app (in server
target) locally with Docker to further debug the performance issues.
I added the reportWebVitals
from Next.js to my _app
which prints a few metrics in the browser console.
Next.js-render
is ~100 ms for the index page (compared to 350-900 ms on lambda)
After starting the server
$ next start
ready - started server on http://localhost:3000
and entering http://localhost:3000
in the browser console, it takes ~3-4s until the page is shown. Might be the same performance issue I'm experiencing on lambda ...
I don't know how to debug the initial load problem. Can you recommend something @danielcondemarin?
I have deployed the nextjs-repros
on Vercel, and also getting cold starts as slow as the serverless-trace
target with serverless-next.js
. Here are results of the k6 run:
Test app link: https://nextjs-repros.vercel.app/. Feel free to run some reasonable amount of k6 tests on here (on free tier):
✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 437
checks.....................: 100.00% ✓ 876 ✗ 0
data_received..............: 1.3 MB 42 kB/s
data_sent..................: 54 kB 1.8 kB/s
http_req_blocked...........: avg=7.47ms min=0s med=1µs max=171.57ms p(90)=38.5ms p(95)=60.6ms
http_req_connecting........: avg=1.88ms min=0s med=0s max=18.47ms p(90)=13.32ms p(95)=18.13ms
✓ http_req_duration..........: avg=173.21ms min=44.45ms med=52.09ms max=1.75s p(90)=205.43ms p(95)=1.35s
http_req_receiving.........: avg=86.8µs min=40µs med=85µs max=492µs p(90)=111.3µs p(95)=117µs
http_req_sending...........: avg=59.02µs min=22µs med=54µs max=1.52ms p(90)=69.3µs p(95)=80.14µs
http_req_tls_handshaking...: avg=5.52ms min=0s med=0s max=156.9ms p(90)=21.41ms p(95)=42.62ms
http_req_waiting...........: avg=173.06ms min=44.33ms med=51.96ms max=1.75s p(90)=205.3ms p(95)=1.35s
http_reqs..................: 438 14.599946/s
iteration_duration.........: avg=3.63s min=252.86ms med=3.58s max=6.74s p(90)=4.91s p(95)=5.08s
iterations.................: 389 12.966619/s
vus........................: 50 min=50 max=50
vus_max....................: 50 min=50 max=50
@asterikx that's interesting, in my tests the local server was always fast for my SSR pages (like double digit millisecond response times). I think for me, the main problem is there is probably some large dependency that's taking a long time to load into memory on cold start - I will need to instrument my code to figure out which one it is. I think cold starts can't be avoided as it's inherent to Lambda (it is spinning up new containers on demand, after all), but here is a summary of things based on the whole thread so far:
fab
is optimized on this. Though FAB is based on serverless
target and eliminates duplicate code by further bundling into a single file. For the serverless-trace
target, as per my previous observation, node_modules
for my nextjs-repros
app was way larger than needed (22 MB vs. 2.9 MB if bundled in the page code). It doesn't seem like for experimental-serverless-trace
this plugin is optimizing the node_modules
directory properly? (next js doesn't add it by default).node_modules
code). Although in practice, I found just reducing code size doesn't have noticeable difference, but rather it has to improve (1) as well. I think this is because AWS does some caching that's pretty efficient for anything directly uploaded to Lambda.For (1) I'll also open an issue with the Next.js team as it appears that serverless-trace
target could be further optimized.
I appreciate all the help so far.
EDIT: seems like on Next.js end, serverless-trace
target is probably already optimized (it does not automatically output node_modules
to .next/serverless
directory). So maybe not an issue with the target, but Vercel is still slow, so it might be a similar root cause, maybe I'll raise an issue to them.
EDIT 2: I saw FAB optimizes by merging all the outputs of .next/serverless
under serverless
target. Example code: https://github.com/fab-spec/fab/blob/master/packages/input-nextjs/src/generateRenderer.ts#L13-L24 (look at the mergeWebpacks
function in particular, here: https://github.com/fab-spec/fab/blob/master/packages/input-nextjs/src/mergeWebpacks.ts) which I believe reduces all duplicate code so code size is much smaller. Maybe we could take a similar approach to combine/optimize serverless-trace
pages + node_modules
? FAB does other stuff (Node.js shims, replacing node-fetch, etc.) to make it compatible with Cloudflare workers, but we don't really need that.
@dphang Thanks for your detailed analysis!
I think cold starts can't be avoided as it's inherent to Lambda (it is spinning up new containers on demand, after all)
Cold starts aren't the real issue. The Initialization
is very fast. It is the first execution of the handler code itself that is very slow, specifically this line:
I think for me, the main problem is there is probably some large dependency that's taking a long time to load into memory on cold start - I will need to instrument my code to figure out which one it is
I think that this likely to cause high response times for me too.
For the serverless-trace target, as per my previous observation, node_modules for my nextjs-repros app was way larger than needed (22 MB vs. 2.9 MB if bundled in the page code)
I think for the serverless-trace
target, node_modules
is larger by design. There is only one node_modules
for all pages. Say page 1 has dependency A and page 2 has dependency B, then node_modules
includes both dependencies A and B. For the serverless
target, the JS file for page 1 includes dependency A (but not B), and the JS file for page 2 includes B (but not A).
However, I'm not sure if the larger node_modules
in the serverless-trace
target has/should have an impact on the first execution, since, ideally, the required JS file should only load the needed dependencies.
I think for me, the main problem is there is probably some large dependency that's taking a long time to load into memory on cold start - I will need to instrument my code to figure out which one it is
@dphang Do you know any tools/packages do such instrumentalization? I'm still stuck with those very high response times :/
Cold starts aren't the real issue. The
Initialization
is very fast. It is the first execution of the handler code itself that is very slow, specifically this line:
Actually I would still count that as due to a cold start. Cold start is when Lambda has to spin up a new execution context because the old one is gone (due to inactivity of some time, usually ~10-15 minutes in my experience?). In this case, initialization is fast (not much is done before the handler if you check the root index.js
for serverless-next.js
. Loading dependencies for the page is done in the handler.). But if dependencies were previously loaded in the same execution context, now that execution context is gone, so it needs to be loaded again.
Maybe it's just semantics, but either way, yes I do agree having to hard reload dependencies is the cause.
Ref: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html
I think for the serverless-trace target, node_modules is larger by design. There is only one node_modules for all pages. Say page 1 has dependency A and page 2 has dependency B, then node_modules includes both dependencies A and B. For the serverless target, the JS file for page 1 includes dependency A (but not B), and the JS file for page 2 includes B (but not A).
Yes I agree, but in my single page example where there is exactly 1 page, using serverless-trace
, it was still larger by almost an order of magnitude (uncompressed 22 MB vs. 2.9 MB). So I don't believe it is being optimized correctly i.e you are loading more code than required to run your page. The serverless
target from Next.js seems to be doing so, but optimizing per page (via webpack), so the overall package size is much larger. But when invoking a page, you are still only loading an optimized set of dependencies for a single page JS file, so it was faster for me.
FAB was also optimizing, although it started from serverless
target and I believe it does so by merging the generated webpack outputs into one file and having custom logic to remove duplicate dependency code. In fact, I think FAB is the most optimized since it requires its single server.js
before handler code (vs. in the handler code with this plugin) i.e it gets the benefits of better init caching, burst CPU, and also it is not billed. I've attached an example FAB lambda package, you can see how it works yourself:
I also use the serverless-wsgi (Python) package and it similarly does a lot of initialization outside of the handler to try to reduce the impact of cold starts. Ex: https://github.com/logandk/serverless-wsgi/blob/master/wsgi_handler.py#L118-L120
@dphang Do you know any tools/packages do such instrumentalization? I'm still stuck with those very high response times :/
I haven't done so myself, I am still fairly less experienced in JS so I need to do more research on that. I know there are webpack bundle analyzers that can probably be used with the outputs of serverless-trace
(but that's already optimized, and this is also a static way to just see which dependencies are large).
I did some more analysis with njstrace
in severless-trace
target. Tracing/logging of course adds to those numbers:
6074ms _app.tsx
3901ms header.tsx
136ms react-icons
343ms i18n.ts
360ms react-tracking
1555ms @chakra-ui/core
45ms _document.ts
41ms _document.ts (from next)
220ms next/dist/next-server/server/render
376ms log.js
379ms load-env-config.js
346ms WEBPACK VAR INJECTION <-- startup() is called several times, each execution ~350ms
79ms next/dist/next-server/server/node-polyfill-fetch
I wonder if we could remove the dynamic require
call and, instead, generate the default-handler
with regular top-level imports
. The generated handler would need to be compiled with webpack. This way we can get rid of the require
call and its penalty (completely?).
I appreciate any feedback or suggestions.
I wonder if we could remove the dynamic
require
call and, instead, generate thedefault-handler
with regular top-levelimports
. The generated handler would need to be compiled with webpack. This way we can get rid of therequire
call and its penalty (completely?).
I'm afraid that is a tough problem. The issue is the page that needs to be require
'd depends on the event
object received, which is only available when the handler is invoked so that makes it impossible to use a top level import.
The only way something like that could work was to have a dedicated CloudFront cache behaviour for a specific SSR page. That way you can predict which page needs to be required and write it as a top level import. That's actually how the serverless-plugin works.
Wouldn't sth. like this work?
import indexPage from "./pages/index"
import helloPage from "./pages/hello"
import errorPage from "./pages/_error"
let page: NextPage;
switch (pagePath) {
case "/":
page = indexPage;
break;
case "/index":
page = indexPage;
break;
case "/hello":
page = helloPage;
break;
default: page = errorPage
}
const { req, res, responsePromise } = lambdaAtEdgeCompat(event.Records[0].cf);
await page.render(req, res);
Wouldn't sth. like this work?
import indexPage from "./pages/index" import helloPage from "./pages/hello" import errorPage from "./pages/_error"
let page: NextPage; switch (pagePath) { case "/": page = indexPage; break; case "/index": page = indexPage; break; case "/hello": page = helloPage; break; default: page = errorPage } const { req, res, responsePromise } = lambdaAtEdgeCompat(event.Records[0].cf); await page.render(req, res);
If doing that you'd be penalising every request to any page with a lot of imports that are not needed. Whilst they may perform slightly better than a dynamic require I think there is still overhead in having top level imports, imagine an app with 20+ ssr pages how much latency that would introduce.
If doing that you'd be penalising every request to any page with a lot of imports that are not needed. Whilst they may perform slightly better than a dynamic require I think there is still overhead in having top level imports, imagine an app with 20+ ssr pages how much latency that would introduce.
I think FAB is doing this, it requires
its server.js
file before the handler, maybe they do it since the overhead with many pages is offset by the benefits of putting it before handler
code (e.g apparently you get burst CPU/memory for init code, and it might be persisted better across Lambda invocations)?
@asterikx also make sure you are not importing unnecessary code. Just looked back at my nextjs-repros
code, for example, in my index.js
page, I have this import from MaterialUI:
import {Typography} from "@material-ui/core";
With serverless
, this ends up just importing Typography within same page file, as I guess NextJS bundles and tree-shakes correctly:
// EXTERNAL MODULE: ./node_modules/@material-ui/core/esm/Typography/Typography.js
var Typography = __webpack_require__("ofer");
"ofer"
above looks to just contain the code for Typography
.
With serverless-trace
this causes node_modules
to contain all components in @material-ui/core
if you do this in any file. And then if you use this type of import, this ends up importing entire @material-ui/core
which I saw in the default handler's node_modules
has 129 components, most of which is unused:
/* harmony import */ var _material_ui_core__WEBPACK_IMPORTED_MODULE_3__ = __webpack_require__("KKbo");
/* harmony import */ var _material_ui_core__WEBPACK_IMPORTED_MODULE_3___default = /*#__PURE__*/__webpack_require__.n(_material_ui_core__WEBPACK_IMPORTED_MODULE_3__);
...
/***/ "KKbo":
/***/ (function(module, exports) {
module.exports = require("@material-ui/core");
...
return __jsx(react__WEBPACK_IMPORTED_MODULE_0___default.a.Fragment, null, __jsx(_material_ui_core_Button__WEBPACK_IMPORTED_MODULE_1___default.a, {
variant: "contained",
color: "primary",
"data-cy": "button",
onClick: handleClickButton
}, "Test Button"), __jsx(_material_ui_core__WEBPACK_IMPORTED_MODULE_3__["Typography"], null, response));
/***/ }),
Here's an article explaining cold starts due to this (AWS SDK as example): https://theburningmonk.com/2019/03/just-how-expensive-is-the-full-aws-sdk/
As I suspected, node_modules
is not optimized and neither are the requires
in the pages (the developer can manually fix, but it will be nice to programmatically optimize this).
Thanks for sharing @dphang.
I honestly do not know where to go from here. Chakra seems to have a big impact in my case. But even without Chakra, the response times are far too high. After all, one main benefit of SSR should be performance ...
Thanks for sharing @dphang.
My linting catches unused imports. I honestly do not know where to go from here. Chakra seems to have a big impact in my case. But even without Chakra, the response times are far too high. After all, one main benefit of SSR should be performance ...
I think linting can't catch this type of issue (you are still using the import, just from a higher level directory), this one needs tree-shaking to remove the unused code.
@asterikx I'm not familiar to how your app. works but I can say that SSR is not always the best option. Have you considered pre-rendering your pages at build time using getStaticPaths
?
@dphang You're right ;)
@danielcondemarin my app is API heavy, so pre-rendering is mostly not an option. It's a multi-tier application with user/role management and includes project and file organization (similar to DropBox Web) and a marketplace with a review system (similar to Amazon). I have proactively moved from CRA to Next.js for SEO (which will be important for the marketplace). But right now I'm wondering if I should go back to CRA and rely on Google crawlers to execute JS ...
@asterikx I'm not familiar to how your app. works but I can say that SSR is not always the best option. Have you considered pre-rendering your pages at build time using
getStaticPaths
?
+1 on this. For me, I now only have two SSR pages which are not very common pages for most users to access via SSR (~15 other SSG pages).
@asterikx also as a quick test you can try to deploy using FAB, if the cold start is still too high with all their optimizations, then I don't think optimizing it here will help much either.
For me I get around 2 seconds on cold start or so which is acceptable for now, I can also accept stale data for some amount of time on these pages, so I'm planning to cache on CloudFront for ~60 seconds as well.
For some reason fab:build
fails.
Build failed.
null
./.fab/.cache/generated-nextjs-renderers.c24b62b.js
Module not found: Error: Can't resolve 'child_process' in '/Users/erikmuller/Documents/Detelling/web/.fab/.cache'
resolve 'child_process' in '/Users/erikmuller/Documents/Detelling/web/.fab/.cache'
Parsed request is a module
using description file: /Users/erikmuller/Documents/Detelling/web/package.json (relative path: ./.fab/.cache)
Field 'browser' doesn't contain a valid alias configuration
But, since FAB is using serverless
target, bundle size will likely be an issue again in my case:
Can't use that unfortunately due to the hard 50 MB deployment package limit. I have ~20 routes and each JS file had ~12-15 MB using the
serverless
target.
I fear Lambda@Edge is simply not able to deliver the performance I expected. I'm considering to give Cloudflare workers a try, or just go back to CRA for now.
Thank you very much @danielcondemarin @dphang for your help :)
I fear Lambda@Edge is simply not able to deliver the performance I expected. I'm considering to give Cloudflare workers a try, or just go back to CRA for now.
I wouldn't put it on Lambda@Edge though, doesn't matter how good the runtime is if the dependencies in play aren't being efficient enough. In terms of cold starts I forgot to mention AWS announced https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/ which is a game changer. It's still not supported by Lambda@Edge but may be at some point 🤞 I'm going to close this for now and good luck with everything. Feel free to open a new issue if you stick with next and this project and need further help 👍🏻
For some reason
fab:build
fails.Build failed. null ./.fab/.cache/generated-nextjs-renderers.c24b62b.js Module not found: Error: Can't resolve 'child_process' in '/Users/erikmuller/Documents/Detelling/web/.fab/.cache' resolve 'child_process' in '/Users/erikmuller/Documents/Detelling/web/.fab/.cache' Parsed request is a module using description file: /Users/erikmuller/Documents/Detelling/web/package.json (relative path: ./.fab/.cache) Field 'browser' doesn't contain a valid alias configuration
But, since FAB is using
serverless
target, bundle size will likely be an issue again in my case:Can't use that unfortunately due to the hard 50 MB deployment package limit. I have ~20 routes and each JS file had ~12-15 MB using the
serverless
target.I fear Lambda@Edge is simply not able to deliver the performance I expected. I'm considering to give Cloudflare workers a try, or just go back to CRA for now.
@asterikx hmm, it might be an issue with your code? Maybe this will help: https://stackoverflow.com/questions/43037590/field-browser-doesnt-contain-a-valid-alias-configuration?
Actually FAB does merge webpack outputs (i.e each page) from serverless
output. For me it reduced bundle size to ~2.7 MB compressed (9.5 MB using experimental-serverless-trace
, and 31 MB using serverless
on this plugin) and was on par with serverless
target perf (best perf). Again, this is because it merges webpack outputs and reduces duplicate code (it's already tree-shaken per page by next's serverless
target, I think this just ensures there are no duplicate dependency code) into one file. I think you will be having ~16-20 MB uncompressed total size using FAB, which is totally fine for Lambda@Edge.
Also Cloudflare workers has a limit of 1 MB compressed on scripts and they don't use Node.js runtime (they use V8 Isolates), so you may have issues even deploying (and other issues with dependencies that expect Node.js runtime), so it won't work for you (until they raise that limit).
FWIW, I'm not affiliated with FAB at all. I would rather use this plugin when possible as I love the Serverless Framework and am heavily on AWS, but that can serve as a quick test for your needs. Though since both this and FAB are open source, it's nice to be able to learn and improve from others.
@danielcondemarin I don't think this issue is solved on the serverless-next.js
end, as this plugin is not optimizing/tree-shaking the Lambda code (pages + node_modules
) using the serverless-trace
target. Though I suppose we can open a separate feature issue for this. Happy to help with any design discussion or even some coding (though I'm not a JS expert).
@danielcondemarin I don't think this issue is solved on the
serverless-next.js
end, as this plugin is not optimizing the Lambda code (pages +node_modules
) using theserverless-trace
target. Though I suppose we can open a separate feature issue for this. Happy to help with any design discussion or even some coding (though I'm not a JS expert).
App build concerns are outside the scope of this project. I think is best to submit this issue on the Next.js repo, maybe start as a discussion? The core of the serverless-trace
target lives in https://github.com/vercel/nft and is the same one as Vercel uses.
Basically by fixing it there everyone benefits.
@danielcondemarin I don't think this issue is solved on the
serverless-next.js
end, as this plugin is not optimizing the Lambda code (pages +node_modules
) using theserverless-trace
target. Though I suppose we can open a separate feature issue for this. Happy to help with any design discussion or even some coding (though I'm not a JS expert).App build concerns are outside the scope of this project. I think is best to submit this issue on the Next.js repo, maybe start as a discussion? The core of the
serverless-trace
target lives in https://github.com/vercel/nft and is the same one as Vercel uses.Basically by fixing it there everyone benefits.
I partly disagree as this plugin is targeting AWS infra (right now) with this Serverless component, so I feel we should at least somewhat concerned with perf optimizations within the context of Lambda, especially since you mention this target in the README. Perhaps add a disclaimer there? But let's agree to disagree.
Anyway, I do not mean to sound contentious, sorry if I come across like that. I'll first create a discussion in Vercel's repo. It looks like it could be a problem on Vercel deployments too, hopefully that's enough motivation for them to consider it.
Thanks for your help!
I partly disagree as this plugin is targeting AWS infra (right now) with this Serverless component, so I feel we should at least somewhat concerned with perf optimizations within the context of Lambda, especially since you mention this target in the README
I do agree with this point and if you can think of any perf. optimisation in the context of Lambda@Edge I'd love to hear! What I'd like to avoid is diverging too much from Next.js build output / Vercel deployments.
As an aside there is an optimisation that Vercel does by compressing the node_module files. I don't see how that would make much difference in terms of execution times but may be something to try out if you're up for it!
I partly disagree as this plugin is targeting AWS infra (right now) with this Serverless component, so I feel we should at least somewhat concerned with perf optimizations within the context of Lambda, especially since you mention this target in the README
I do agree with this point and if you can think of any perf. optimisation in the context of Lambda@Edge I'd love to hear! What I'd like to avoid is diverging too much from Next.js build output / Vercel deployments.
As an aside there is an optimisation that Vercel does by compressing the node_module files. I don't see how that would make much difference in terms of execution times but may be something to try out if you're up for it!
Yup, totally agree there, we should not have to mess with build outputs too much. I've created a discussion here: https://github.com/vercel/next.js/discussions/16276
Hi @asterikx I'am experiencing the same issue. Sometimes my page take 7 seconds to load, sometimes 2 seconds. I cannot figure out why.
Based on this conversation, can you give me your feedback on how did you deepdive? which tool did you use to get some analytics? finding the bottleneck? Are you still using nextJs/Lambda?
Thanks!
For testing purpose, I also deployed on an EC2 instance and compare both websites: Light blue: EC2 Dark Blue: Lambda
Source: https://www.dareboost.com/
source: https://www.site24x7.com/
@alan345 if you have bad performance, it can be caused by cold start, which is mostly due to requiring Node.js modules. For AWS Lambda@Edge, there seems to be a minimum of 150-200 ms in my tests (for just initializing the container, even on a hello-world style function, it doesn't depend on memory of the Lambda function). Then the rest is requiring your dependencies. Code size may affect it slightly as well, but it's mostly negligible in my experience (it may add a couple ms but that's not the bulk of it).
We have tried to optimize it as much as possible for the handler code itself through lazy-loading and using very light modules (e.g the new AWS SDK v3 for S3 client). And minified, the handler code generally adds around 30 ms more to the cold start time.
The rest of the time should be due to your dependencies. You can do the following:
performance.now()
statements in your server JS files to analyze which dependency is taking longest to load.
Describe the bug I'm experiencing
Invocation
times of more than 7 seconds for cold lambdas. Subsequent invocations take ~250ms.I'm using the
@sls-next/lambda-at-edge@1.6.0-alpha.0
package with serverless trace mode and deploy the lambdas/public resources using CDK. My (zipped) bundle size is 6.9MB.First invocation:
Second invocation (within a few minutes after the first invocation):
Expected behavior
Invocation
duration for cold lambdas should be similar to hot lambdas. It shouldn't be 30 times more.Question How to measure/debug the performance of the
default-handler
? The performance problem seems related to the handler code rather than resource initialization during cold starts.