Very high cold start times

serverless-nextjs / serverless-next.js

⚡ Deploy your Next.js apps on AWS Lambda@Edge via Serverless Components

MIT License

4.47k stars 457 forks source link

Very high cold start times #547

Closed asterikx closed 4 years ago

asterikx commented 4 years ago

Describe the bug I'm experiencing Invocation times of more than 7 seconds for cold lambdas. Subsequent invocations take ~250ms.

I'm using the @sls-next/lambda-at-edge@1.6.0-alpha.0 package with serverless trace mode and deploy the lambdas/public resources using CDK. My (zipped) bundle size is 6.9MB.

First invocation:

Second invocation (within a few minutes after the first invocation):

Expected behavior Invocation duration for cold lambdas should be similar to hot lambdas. It shouldn't be 30 times more.

Question How to measure/debug the performance of the default-handler? The performance problem seems related to the handler code rather than resource initialization during cold starts.

danielcondemarin commented 4 years ago

@asterikx Could you share the code for the next page you're seeing the high cold start time? i.e. Are you using getServerSideProps, getStaticProps etc. that may be adding time during server side rendering?

asterikx commented 4 years ago

@danielcondemarin sure, I'm using next-i18next, that needs getInitialProps:

import * as React from 'react';
import { ChakraProvider, CSSReset, Global } from '@chakra-ui/core';
import { track } from 'react-tracking';
import type { AppProps, AppContext } from 'next/app';
import { RecoilRoot } from 'recoil';

import { theme } from '../theme';
import { appWithTranslation } from '../i18n';
import { styles } from '../styles/styles';
import { UserProvider } from '../contexts/user';
import { configureAuth } from '../utils/auth';
import { HeaderProvider } from '../contexts/header';
import { MDXProvider } from '../components/MDXProvider';
import { useTrackPages } from '../hooks/useTrackPages';
import App from 'next/app';

configureAuth();

function MyApp({ Component, pageProps }: AppProps) {
  useTrackPages();

  return (
    <ChakraProvider theme={theme}>
      <CSSReset />
      <Global styles={styles} />
      {/*<Suspense fallback={<SplashScreen />}>*/}
      <MDXProvider>
        <RecoilRoot>
          <UserProvider>
            <HeaderProvider>
              <Component {...pageProps} />
            </HeaderProvider>
          </UserProvider>
        </RecoilRoot>
      </MDXProvider>
      {/*</Suspense>*/}
    </ChakraProvider>
  );
}

// https://github.com/isaachinman/next-i18next/issues/615#issuecomment-575578375
MyApp.getInitialProps = async (appContext: AppContext) => {
  const appProps = await App.getInitialProps(appContext);
  return { ...appProps };
};

export default track({})(appWithTranslation(MyApp));

I'm seeing an error message in the browser console:

[Error] Failed to load resource: the server responded with a status of 404 () (common.json, line 0) https://staging.detelling.com/static/locales/de/common.json

But I don't think this affects server rendering. Cloudwatch does not show any error logs. The index page makes no further calls to getInitialProps, getServerSideProps, or getStaticProps.

danielcondemarin commented 4 years ago

I wonder if the call to https://staging.detelling.com/static/locales/de/common.json is adding to your response times.

To be certain what's going on is best to measure time it takes @sls-next/lambda-at-edge in different key steps:

overall - end to end execution
time it takes to node require the next.js page
time it takes to do server side rendering + compat layer work

Here is a rough guide of how to put some performance measurements in the default handler. This is something I'd like to introduce maybe behind a boolean flag. Feel free to PR if you can,

import {
  performance
} from 'perf_hooks';

const overallStartTime = performance.now(); // line 115

...

 const pageRequireCallStartTime = performance.now(); // line 156
 const page = require(`./${pagePath}`);
 const pageRequireCallEndTime = performance.now(); 

 console.log("Next page require time (ms) - ", pageRequireCallEndTime - pageRequireCallStartTime);
...

return responsePromise.then(response => {     // line 161
  const overallEndTime = performance.now();

  console.log("SSR + compat layer time (ms) - ", overallEndTime - pageRequireCallEndTime);
  console.log("Overall time (ms) - ", overallEndTime - overallStartTime);

  return response;
})

danielcondemarin commented 4 years ago

One more thing how much Memory do you have allocated to the functions? This has an impact on node require calls.

asterikx commented 4 years ago

One more thing how much Memory do you have allocated to the functions? This has an impact on node require calls.

512 MB. CloudWatch reports 176 MB used.

I'm wondering whether serverless trace mode could cause the high cold start times (my node_modules has 29 MB and ~3500 files total)? Will the require call only add to the execution time on the first handler invocation or on each invocation?

Here is a rough guide of how to put some performance measurements in the default handler. This is something I'd like to introduce maybe behind a boolean flag.

Where would you like to put a logExecutionTimes flag? Lambda@Edge does not support environment variables, so we must decide at build time whether to perform time measurements or not. Do we need to introduce a factory for creating different handlers based on the flag passed to the Builder? Can you think of a simpler approach?

danielcondemarin commented 4 years ago

512 MB. CloudWatch reports 176 MB used.

That's not the issue then

I'm wondering whether serverless trace mode could cause the high cold start times (my node_modules has 29 MB and ~3500 files total)? Will the require call only add to the execution time on the first handler invocation or on each invocation?

This could be the issue indeed and we can prove it by adding the performance times. serverless trace mode doesn't bundle everything into one big file, which the previous serverless target did. There are claims that bundling everything into one file reduces nodejs bootup time by doing less require's but tbh I've not looked into this thoroughly. There are other optimisations that could be done in the serverless trace target but let's confirm that's the issue first.

Are you seeing the problem if using the old serverless target ?

Here is a rough guide of how to put some performance measurements in the default handler. This is something I'd like to introduce maybe behind a boolean flag.

Where would you like to put a logExecutionTimes flag? Lambda@Edge does not support environment variables, so we must decide at build time whether to perform time measurements or not. Do we need to introduce a factory for creating different handlers based on the flag passed to the Builder? Can you think of a simpler approach?

You're right it will need to be done at build time. I think we can keep the same handler but create a nice abstraction for the logger. Something like,

const perfLogger = (enablePerfLogger) => {
  if (enablePerfLogger) { 
     const performance = require('perf_hooks').performance;
     return { now: () => performance.now(), log: (t2, t1) => console.log("Time (ms) ", t2 - t1); } 
  }

  return { now: () => {}, log: (t2, t1) => {} }
}

asterikx commented 4 years ago

Are you seeing the problem if using the old serverless target?

Can't use that unfortunately due to the hard 50 MB deployment package limit. I have ~20 routes and each JS file had ~12-15 MB using the serverless target. Using serverless trace the JS files are down to ~200kB each.

I think we can keep the same handler but create a nice abstraction for the logger. Something like

Currently, the default handler is a static file copied to the default-lambda dir. Would you add an additional .env or config.json file to the default-lambda dir, so that the handler can read the enablePerfLogger value from there at runtime?

Thanks for your help and feedback so far. I'll try to create a PR tomorrow.

danielcondemarin commented 4 years ago

Currently, the default handler is a static file copied to the default-lambda dir. Would you add an additional .env or config.json file to the default-lambda dir, so that the handler can read the enablePerfLogger value from there at runtime?

Instead of enablePerfLogger let's call it what you suggested earlier as is more explicit logLambdaExecutionTimes.

You can pass the logLambdaExecutionTimes as another build option.

The boolean flag can be added to the default handler manifest and apiBuildManifest which is available at runtime since is copied to the lambda artefacts.

Once the above is done, we can have a new serverless.yml input logLambdaExecutionTimes that can be passed to the Builder.

Thanks for your help and feedback so far. I'll try to create a PR tomorrow.

No worries and thank you for PR'ing 🙏

dphang commented 4 years ago

I'm also seeing the same issue. With serverless-trace target (Lambda code size ~8 MB), cold starts seem to be ~5 seconds or more. With the regular serverless target (Lambda code size ~26 MB), cold starts seem to be ~2 seconds. I do have a bunch of dynamic routes (~15) which is adding to the code size. Even warm invocations seem slightly higher (I think P50 went up from ~100 ms to ~150 ms, though this includes network times).

I'll try to post a repro and more detailed perf metrics when I can.

dphang commented 4 years ago

Alright, I've created a repro here.

Repro link: https://github.com/dphang/nextjs-repros.

Some notes:

I added all the dependencies I'm using in my own project, which include React, NextJS, MaterialUI, etc. But there is only 1 SSR index page which just has a simple button on it and I'm only using a few components from MaterialUI. So I guess the Lambda bundle size would be quite small.
I used k6 (https://github.com/loadimpact/k6) to test the request performance (CloudFront URL - includes CloudFront + Lambda + network time). Made sure to reset/redeploy the Lambda each time and ran the following command on the CloudFront URL:
If you are reproing, make sure to rename the app in serverless.yml since I currently use that bucket.

K6_BASE_URL=https://xxx.cloudfront.net k6 run perf/perf-test.js --vus 50 --duration 30s

(Note: this specific script is running 50 vus (virtual users) that will hit the endpoint every 2-5 seconds for 30 seconds total. Total of ~300-400 iterations within 30 seconds. Feel free to tune this but this at least would invoke a bunch of cold starts due to simultaneous requests). Also, I redacted the cloudfront URL since that's my own AWS account, feel free to use your own)

With serverless target (Lambda zip ~1.4 MB, index.js page was ~2.9 MB uncompressed), got these results:

✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 448 checks.....................: 100.00% ✓ 898 ✗ 0
data_received..............: 1.3 MB 44 kB/s data_sent..................: 64 kB 2.1 kB/s http_req_blocked...........: avg=6.13ms min=0s med=0s max=224.2ms p(90)=47.02ms p(95)=50.78ms http_req_connecting........: avg=1.6ms min=0s med=0s max=16.97ms p(90)=12.69ms p(95)=14.18ms ✓ http_req_duration..........: avg=55ms min=27.18ms med=35.9ms max=701.61ms p(90)=141.08ms p(95)=162.49ms http_req_receiving.........: avg=92.86µs min=31µs med=72µs max=2.76ms p(90)=108.19µs p(95)=127µs
http_req_sending...........: avg=60.93µs min=20µs med=52µs max=1.15ms p(90)=86µs p(95)=100µs
http_req_tls_handshaking...: avg=4.42ms min=0s med=0s max=183.36ms p(90)=33.3ms p(95)=35.58ms http_req_waiting...........: avg=54.85ms min=27.07ms med=35.76ms max=701.44ms p(90)=140.94ms p(95)=162.38ms http_reqs..................: 449 14.966646/s iteration_duration.........: avg=3.52s min=269.52ms med=3.46s max=5.17s p(90)=4.72s p(95)=4.92s
iterations.................: 399 13.299982/s vus........................: 50 min=50 max=50 vus_max....................: 50 min=50 max=50

I looked at Lambda logs and the Lambda execution time was generally around 350-400 ms and 150 ms of that was initialization time (this was manual process, not sure of an easy way to extract percentile data from the CloudWatch Lambda logs. If someone knows a good way, please let me know :). So I guess the difference is network latency.

With serverless-trace target (interestingly, the Lambda zip is higher here at 3.7 MB compressed which is not what I would expect. node_modules is taking ~22 MB uncompressed and the index.js page is ~100 kB.), I got these results:

✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 426 checks.....................: 100.00% ✓ 854 ✗ 0
data_received..............: 1.3 MB 42 kB/s data_sent..................: 63 kB 2.1 kB/s http_req_blocked...........: avg=6.87ms min=0s med=0s max=162.3ms p(90)=53.37ms p(95)=54.91ms http_req_connecting........: avg=1.95ms min=0s med=0s max=19.62ms p(90)=15ms p(95)=16.57ms ✓ http_req_duration..........: avg=187.7ms min=27.32ms med=37.11ms max=1.5s p(90)=1.31s p(95)=1.38s
http_req_receiving.........: avg=105.9µs min=40µs med=74µs max=2.05ms p(90)=103.4µs p(95)=128.69µs http_req_sending...........: avg=57.1µs min=21µs med=48µs max=1.17ms p(90)=82.4µs p(95)=109.69µs http_req_tls_handshaking...: avg=4.86ms min=0s med=0s max=147.46ms p(90)=35.44ms p(95)=39.04ms http_req_waiting...........: avg=187.54ms min=27.23ms med=36.99ms max=1.5s p(90)=1.31s p(95)=1.38s
http_reqs..................: 427 14.233232/s iteration_duration.........: avg=3.71s min=1.66s med=3.8s max=6.02s p(90)=4.86s p(95)=5.03s
iterations.................: 377 12.566577/s vus........................: 50 min=50 max=50 vus_max....................: 50 min=50 max=50

Similarly in Lambda logs I was getting 1000 ms durations with init durations of ~150 ms.

Ran each of the k6 tests a few times just to be sure and it does seem that indeed serverless-trace target has higher times on cold start (1.3-1.5 sec vs. 700 ms request time, 1 sec vs. 350 ms Lambda execution time). But Lambda init was about the same at 150 ms, so I don't think the Lambda zip size is affecting it much.

I'm not a JS expert (I normally work in Python/Java) but I suspect:

node_modules seems too large and can be reduced further, and also could it be bundled into 1 file?
As mentioned by @danielcondemarin, with this target now the index.js file is minimal (100 kB) and it needs to do a bunch of requires, which could add to JS execution time.

GitHub
dphang/nextjs-repros
Test repo to repro Cypress issue with waiting for alias routes - dphang/nextjs-repros

asterikx commented 4 years ago

It's indeed the require call that makes up for most of the high response time on the first invocation.

https://github.com/asterikx/serverless-next.js/blob/master/packages/libs/lambda-at-edge/src/default-handler.ts#L121-L137

1st invocation: handler execution time: 7203.740994999884 (ms) (code) require JS execution time: 6337.293548000045 (ms) (code) page render execution time: 0.07000900013372302 (ms) (code)

2nd invocation: handler execution time: 348.02547800005414 (ms) require JS execution time: 0.011753000086173415 (ms) page render execution time: 0.008907000068575144 (ms)

3rd invocation: handler execution time: 545.747255000053 (ms) require JS execution time: 0.009914000052958727 (ms) page render execution time: 0.009519999846816063 (ms)

Also, there is a significant overhead of 350-850 ms that is not caused by requiring JS files or rendering... I hope we can get these numbers down as well. I was really hoping for response times well below 100 ms. Otherwise, I might have to spin up a Fargate container :/

GitHub
asterikx/serverless-next.js
⚡ Deploy your next apps using the serverless framework - asterikx/serverless-next.js

EDIT: I have added some more logging (here). ~The remaining overhead of 350-850 ms comes from the lambda compat call (it amounts to 99,95% of the execution time for subsequent invocations):~

https://github.com/serverless-nextjs/serverless-next.js/blob/03a635f1f5b4e400e04b1105d409aac664b64aad/packages/libs/lambda-at-edge/src/default-handler.ts#L158

EDIT 2: The call to page.render makes up for ~99% of the total time. The compat layer takes ~1ms.

asterikx commented 4 years ago

@danielcondemarin do you see any chance of bringing these execution times down?

I have come across the ncc package from Vercel for compiling a NodeJS module into a single JS file, but I'm not sure if it can be used for this issue.

I had a quick look at the next-aws-cloudfront.js file, but I don't see where the high execution times might come from.

danielcondemarin commented 4 years ago

It's indeed the require call that makes up for most of the high response time on the first invocation.

https://github.com/asterikx/serverless-next.js/blob/master/packages/libs/lambda-at-edge/src/default-handler.ts#L121-L137

require JS execution time: 6337.293548000045 (ms) (code)

6.3 seconds is very high indeed and not something I've seen before during require time using the serverless trace target. Also seems way off from @dphang 's benchmarks.

Could you try one more thing, spin up the nextjs local server using production mode,

$ next build && next start

And test how long does the first request takes (Ideally this would be tested constraining memory resources, e.g. using Docker to match the lambda runtime as close as possible)

I suspect one or more of your dependencies is responsible for this and not necessarily the amount of require calls. I would start by removing global HoCs in your _app and testing how that affects performance, that is getting rid of react-tracking and appWithTranslation so your App export ends up being export default MyApp.

page render execution time: 0.07000900013372302 (ms) (code)

Your page.render logs are not accurate. You need to wait for getInitialProps / SSR promise to fulfil,

  const tBeforeSsrRender = now();
  page.render(req, res);

return responsePromise.then(response => { 
  const tAfterSsrRender = now();
  log("page render execution time", tBeforeSsrRender, tAfterSsrRender);
  return response;
})

danielcondemarin commented 4 years ago

@danielcondemarin do you see any chance of bringing these execution times down?

I can't see an obvious way right now tbh. The serverless-trace target is the same that Vercel uses so I'd be interested seeing how it compares. One optimisation Vercel does is compressing each file in the dependencies but I don't see how that makes much difference at runtime initialisation.

I have come across the ncc package from Vercel for compiling a NodeJS module into a single JS file, but I'm not sure if it can be used for this issue.

Using ncc is the equivalent of the serverless target and would lead to having one bundle per page with loads of duplicated dependencies code. That would mean going back to square one in terms of lambda artefact size limit.

I had a quick look at the next-aws-cloudfront.js file, but I don't see where the high execution times might come from.

I've deliberately tried to keep the compat layer as lightweight as possible. One possible optimisation is to make gzipping async here using its async version

dphang commented 4 years ago

@danielcondemarin I did more research and came across https://fab.dev/ which apparently bundles the server JS into a single file for all pages. It supports NextJs. Not sure about the perf, I can try it out today. Maybe this or something similar can be used?

Frontend Application Bundles
Home
Empower your NuxtJS application with @nuxt/content module: write in a content/ directory and fetch your Markdown, JSON, YAML and CSV files through a MongoDB like API, acting as Git-based Headless CMS.

danielcondemarin commented 4 years ago

@danielcondemarin I did more research and came across https://fab.dev/ which apparently bundles the server JS into a single file for all pages. It supports NextJs. Not sure about the perf, I can try it out today. Maybe this or something similar can be used?

AFAIK Fab uses the Next.js serverless target behind the scenes, which would have the same issues around hitting the Lambda artefact size.

I think it would be more accurate to compare against Vercel which uses the same serverless-trace target.

dphang commented 4 years ago

@danielcondemarin I did more research and came across https://fab.dev/ which apparently bundles the server JS into a single file for all pages. It supports NextJs. Not sure about the perf, I can try it out today. Maybe this or something similar can be used?

AFAIK Fab uses the Next.js serverless target behind the scenes, which would have the same issues around hitting the Lambda artefact size.

Ah, let me try it out on both my test project and real project, I think I read somewhere the package does some other optimizations on top of the serverless target, since they target Cloudflare as well (which has a very small 1 MB compressed script limit). Though not sure how the performance is like (code size certainly doesn't noticeably affect cold starts). Will report back with my findings.

dphang commented 4 years ago

Have tested with FAB on Lambda@Edge with the nextjs-repros repository I posted. Here are the results. Lambda memory is 1024 MB and Node.js 12.x

FAB (1 SSR index page, code size = 1.4 MB):

✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 451 checks.....................: 100.00% ✓ 904 ✗ 0
data_received..............: 1.3 MB 44 kB/s data_sent..................: 65 kB 2.2 kB/s http_req_blocked...........: avg=6.89ms min=0s med=0s max=177.27ms p(90)=56.93ms p(95)=59.44ms http_req_connecting........: avg=1.75ms min=0s med=0s max=17.78ms p(90)=13.51ms p(95)=16.38ms ✓ http_req_duration..........: avg=46.02ms min=26.23ms med=34.97ms max=624.91ms p(90)=72.93ms p(95)=87.68ms http_req_receiving.........: avg=121.23µs min=37µs med=78µs max=4.85ms p(90)=118.9µs p(95)=136.45µs http_req_sending...........: avg=64.39µs min=23µs med=57.5µs max=1.18ms p(90)=82.9µs p(95)=92µs
http_req_tls_handshaking...: avg=5.1ms min=0s med=0s max=160.83ms p(90)=40.46ms p(95)=43.35ms http_req_waiting...........: avg=45.83ms min=26.13ms med=34.8ms max=624.79ms p(90)=72.78ms p(95)=87.54ms http_reqs..................: 452 15.066646/s iteration_duration.........: avg=3.45s min=218.67ms med=3.42s max=5.06s p(90)=4.68s p(95)=4.82s
iterations.................: 402 13.399982/s vus........................: 50 min=50 max=50 vus_max....................: 50 min=50 max=50

It doesn't seem much different, perf seems in line with serverless-next.js with serverless target as I posted earlier, maybe slightly better at P90/P95. But noticed that most cold starts have higher init duration (~450 ms) but lower invocation duration (~75 ms) when looking. I think it's probably optimized that way since init is not billed in Lambda, and apparently init has higher CPU allocated to you?

But this is a simple case, if I added a few more duplicate pages to my nextjs-repros repo, here are the comparisons for 3 SSR pages:

serverless-next.js serverless target (3 SSR pages, code size = 3.2 MB):

✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 439 checks.....................: 100.00% ✓ 880 ✗ 0
data_received..............: 1.4 MB 46 kB/s data_sent..................: 64 kB 2.1 kB/s http_req_blocked...........: avg=6ms min=0s med=0s max=149.3ms p(90)=44.1ms p(95)=48.35ms http_req_connecting........: avg=1.71ms min=0s med=0s max=16.79ms p(90)=13.28ms p(95)=15.2ms
✓ http_req_duration..........: avg=108.36ms min=27.86ms med=35.05ms max=792.11ms p(90)=657.13ms p(95)=702.3ms http_req_receiving.........: avg=152.17µs min=34µs med=62µs max=5.02ms p(90)=166.2µs p(95)=638.04µs http_req_sending...........: avg=49.1µs min=18µs med=44µs max=1.19ms p(90)=66µs p(95)=91µs
http_req_tls_handshaking...: avg=4.23ms min=0s med=0s max=132.47ms p(90)=29.67ms p(95)=32.85ms http_req_waiting...........: avg=108.16ms min=26.4ms med=34.91ms max=791.96ms p(90)=656.95ms p(95)=702.13ms http_reqs..................: 440 14.666662/s iteration_duration.........: avg=3.56s min=832.32ms med=3.51s max=5.73s p(90)=4.82s p(95)=4.9s
iterations.................: 389 12.966663/s vus........................: 50 min=50 max=50 vus_max....................: 50 min=50 max=50

serverless-next.js serverless-trace target (3 SSR pages, code size = 3.7 MB)

✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 433 checks.....................: 100.00% ✓ 868 ✗ 0
data_received..............: 1.4 MB 46 kB/s data_sent..................: 63 kB 2.1 kB/s http_req_blocked...........: avg=7.69ms min=0s med=0s max=248.7ms p(90)=61.02ms p(95)=61.21ms http_req_connecting........: avg=2.42ms min=0s med=0s max=23.92ms p(90)=17.52ms p(95)=20.65ms ✓ http_req_duration..........: avg=180.23ms min=26.66ms med=34.22ms max=1.48s p(90)=1.27s p(95)=1.38s
http_req_receiving.........: avg=125.17µs min=32µs med=62µs max=2.13ms p(90)=116.7µs p(95)=624.69µs http_req_sending...........: avg=50.51µs min=17µs med=46µs max=790µs p(90)=69µs p(95)=87.34µs http_req_tls_handshaking...: avg=5.02ms min=0s med=0s max=143.33ms p(90)=37.26ms p(95)=41.21ms http_req_waiting...........: avg=180.06ms min=25.97ms med=34.07ms max=1.48s p(90)=1.27s p(95)=1.38s
http_reqs..................: 434 14.466619/s iteration_duration.........: avg=3.71s min=1.72s med=3.69s max=6.36s p(90)=4.89s p(95)=5.15s
iterations.................: 383 12.766624/s vus........................: 50 min=50 max=50 vus_max....................: 50 min=50 max=50

FAB (3 SSR pages, code size = 1.4 MB)

✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 443 checks.....................: 100.00% ✓ 888 ✗ 0
data_received..............: 1.4 MB 47 kB/s data_sent..................: 64 kB 2.1 kB/s http_req_blocked...........: avg=6.62ms min=0s med=0s max=147.02ms p(90)=52.79ms p(95)=57.11ms http_req_connecting........: avg=1.94ms min=0s med=0s max=29.99ms p(90)=15.67ms p(95)=17.92ms ✓ http_req_duration..........: avg=108.43ms min=25.46ms med=34.83ms max=768.4ms p(90)=649.89ms p(95)=690.86ms http_req_receiving.........: avg=119.45µs min=34µs med=67µs max=1.93ms p(90)=126.69µs p(95)=530.44µs http_req_sending...........: avg=65.27µs min=19µs med=48µs max=3.24ms p(90)=86.39µs p(95)=118.09µs http_req_tls_handshaking...: avg=4.64ms min=0s med=0s max=130.37ms p(90)=36.88ms p(95)=38.95ms http_req_waiting...........: avg=108.25ms min=25.38ms med=34.62ms max=768.21ms p(90)=649.76ms p(95)=690.71ms http_reqs..................: 444 14.799903/s iteration_duration.........: avg=3.56s min=911.67ms med=3.47s max=5.62s p(90)=4.8s p(95)=4.93s
iterations.................: 393 13.099914/s vus........................: 50 min=50 max=50 vus_max....................: 50 min=50 max=50

In fact, on my real app, fab seems slightly better in cold starts but median time is actually higher. I ran all benchmarks again. Comparisons below (50 vus / 30 sec duration perf benchmark, 1 GB Lambda, Node 12.x runtime):

serverless target (12 SSG pages, 6 SSR pages, code size = ~31.8 MB):

http_req_duration..........: avg=352.31ms min=63.44ms med=96.84ms max=3.39s p(90)=2.04s p(95)=2.19s
http_reqs..................: 417 13.899881/s

serverless-trace target (12 SSG pages, 6 SSR pages, code size = ~9.5 MB):

http_req_duration..........: avg=870.97ms min=58.88ms med=135.72ms max=5.61s p(90)=5.39s p(95)=5.51s
http_reqs..................: 367 12.233313/s

FAB (12 SSG pages, 6 SSR pages, code size = ~9.5 MB):

http_req_duration..........: avg=375.88ms min=100.89ms med=135.63ms max=2.16s p(90)=1.94s p(95)=2.02s
http_reqs..................: 406 13.533244/s

The most surprising is that FAB reduced my Lambda code size to ~2.7 MB.

Based on this, I conclude that:

FAB produces the most optimized server.js file from the Next.js build outputs from serverless target. I don't believe code size impacting cold starts much (maybe a few milliseconds), but it's also good to do to avoid hitting Lambda limits.
In terms of cold start perf, deploying the FAB had about the same perf as serverless target in nextjs-repros. But for my real app, it seems to have improved slightly from ~3 seconds (serverless target) to ~2 seconds, but at the cost of increasing the average and median response times slightly. Although my app is probably not the most optimized at this time. We would also have to look into the FAB code to better understand it (I haven't looked into it much).

Edit: saw your updated comment. I'll benchmark against Vercel later (need to setup a new account due to an issue with my existing one).

asterikx commented 4 years ago

Your page.render logs are not accurate. You need to wait for getInitialProps / SSR promise to fulfill,

Arg, the render call is async of course. My page.render call is the one that takes 350-900 ms. The compat layer call takes 1-2 ms. I have to investigate the render call on my end, that's definitely not acceptable.

I suspect one or more of your dependencies is responsible for this and not necessarily the amount of require calls. I would start by removing global HoCs in your _app and testing how that affects performance, that is getting rid of react-tracking and appWithTranslation so your App export ends up being export default MyApp.

I tried that. It doesn't make a noticeable difference, neither to the require nor to the render call.

EDIT: I built an run my app (in server target) locally with Docker to further debug the performance issues. I added the reportWebVitals from Next.js to my _app which prints a few metrics in the browser console. Next.js-render is ~100 ms for the index page (compared to 350-900 ms on lambda) After starting the server

$ next start
ready - started server on http://localhost:3000

and entering http://localhost:3000 in the browser console, it takes ~3-4s until the page is shown. Might be the same performance issue I'm experiencing on lambda ...

I don't know how to debug the initial load problem. Can you recommend something @danielcondemarin?

dphang commented 4 years ago

I have deployed the nextjs-repros on Vercel, and also getting cold starts as slow as the serverless-trace target with serverless-next.js. Here are results of the k6 run:

Test app link: https://nextjs-repros.vercel.app/. Feel free to run some reasonable amount of k6 tests on here (on free tier):

✓ check_failure_rate.........: 0.00% ✓ 0 ✗ 437 checks.....................: 100.00% ✓ 876 ✗ 0
data_received..............: 1.3 MB 42 kB/s data_sent..................: 54 kB 1.8 kB/s http_req_blocked...........: avg=7.47ms min=0s med=1µs max=171.57ms p(90)=38.5ms p(95)=60.6ms http_req_connecting........: avg=1.88ms min=0s med=0s max=18.47ms p(90)=13.32ms p(95)=18.13ms ✓ http_req_duration..........: avg=173.21ms min=44.45ms med=52.09ms max=1.75s p(90)=205.43ms p(95)=1.35s
http_req_receiving.........: avg=86.8µs min=40µs med=85µs max=492µs p(90)=111.3µs p(95)=117µs
http_req_sending...........: avg=59.02µs min=22µs med=54µs max=1.52ms p(90)=69.3µs p(95)=80.14µs http_req_tls_handshaking...: avg=5.52ms min=0s med=0s max=156.9ms p(90)=21.41ms p(95)=42.62ms http_req_waiting...........: avg=173.06ms min=44.33ms med=51.96ms max=1.75s p(90)=205.3ms p(95)=1.35s
http_reqs..................: 438 14.599946/s iteration_duration.........: avg=3.63s min=252.86ms med=3.58s max=6.74s p(90)=4.91s p(95)=5.08s
iterations.................: 389 12.966619/s vus........................: 50 min=50 max=50 vus_max....................: 50 min=50 max=50

@asterikx that's interesting, in my tests the local server was always fast for my SSR pages (like double digit millisecond response times). I think for me, the main problem is there is probably some large dependency that's taking a long time to load into memory on cold start - I will need to instrument my code to figure out which one it is. I think cold starts can't be avoided as it's inherent to Lambda (it is spinning up new containers on demand, after all), but here is a summary of things based on the whole thread so far:

Ideally, load exactly the dependency code needed to run your SSR page code. I guess fab is optimized on this. Though FAB is based on serverless target and eliminates duplicate code by further bundling into a single file. For the serverless-trace target, as per my previous observation, node_modules for my nextjs-repros app was way larger than needed (22 MB vs. 2.9 MB if bundled in the page code). It doesn't seem like for experimental-serverless-trace this plugin is optimizing the node_modules directory properly? (next js doesn't add it by default).

Also, app developers should be careful not to introduce heavy dependencies in their SSR pages.
1. Run init code outside of the handler so Lambda can reuse it as much as possible. Per AWS: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html. As a bonus, init code is not billed and I believe it runs with faster CPU as well, so it could help with cold starts. Source: https://twitter.com/alexbdebrie/status/1192120017137127425. FAB seems to be doing this when using their lambda deployer.
2. Deployment package size should be as small as possible so that Lambda can download it faster (minification, removing duplicate node_modules code). Although in practice, I found just reducing code size doesn't have noticeable difference, but rather it has to improve (1) as well. I think this is because AWS does some caching that's pretty efficient for anything directly uploaded to Lambda.

For (1) I'll also open an issue with the Next.js team as it appears that serverless-trace target could be further optimized.

I appreciate all the help so far.

EDIT: seems like on Next.js end, serverless-trace target is probably already optimized (it does not automatically output node_modules to .next/serverless directory). So maybe not an issue with the target, but Vercel is still slow, so it might be a similar root cause, maybe I'll raise an issue to them.

EDIT 2: I saw FAB optimizes by merging all the outputs of .next/serverless under serverless target. Example code: https://github.com/fab-spec/fab/blob/master/packages/input-nextjs/src/generateRenderer.ts#L13-L24 (look at the mergeWebpacks function in particular, here: https://github.com/fab-spec/fab/blob/master/packages/input-nextjs/src/mergeWebpacks.ts) which I believe reduces all duplicate code so code size is much smaller. Maybe we could take a similar approach to combine/optimize serverless-trace pages + node_modules? FAB does other stuff (Node.js shims, replacing node-fetch, etc.) to make it compatible with Cloudflare workers, but we don't really need that.

asterikx commented 4 years ago

@dphang Thanks for your detailed analysis!

I think cold starts can't be avoided as it's inherent to Lambda (it is spinning up new containers on demand, after all)

Cold starts aren't the real issue. The Initialization is very fast. It is the first execution of the handler code itself that is very slow, specifically this line:

https://github.com/serverless-nextjs/serverless-next.js/blob/03a635f1f5b4e400e04b1105d409aac664b64aad/packages/libs/lambda-at-edge/src/default-handler.ts#L157

I think for me, the main problem is there is probably some large dependency that's taking a long time to load into memory on cold start - I will need to instrument my code to figure out which one it is

I think that this likely to cause high response times for me too.

For the serverless-trace target, as per my previous observation, node_modules for my nextjs-repros app was way larger than needed (22 MB vs. 2.9 MB if bundled in the page code)

I think for the serverless-trace target, node_modules is larger by design. There is only one node_modules for all pages. Say page 1 has dependency A and page 2 has dependency B, then node_modules includes both dependencies A and B. For the serverless target, the JS file for page 1 includes dependency A (but not B), and the JS file for page 2 includes B (but not A).

However, I'm not sure if the larger node_modules in the serverless-trace target has/should have an impact on the first execution, since, ideally, the required JS file should only load the needed dependencies.

I think for me, the main problem is there is probably some large dependency that's taking a long time to load into memory on cold start - I will need to instrument my code to figure out which one it is

@dphang Do you know any tools/packages do such instrumentalization? I'm still stuck with those very high response times :/

dphang commented 4 years ago

Cold starts aren't the real issue. The Initialization is very fast. It is the first execution of the handler code itself that is very slow, specifically this line:

Actually I would still count that as due to a cold start. Cold start is when Lambda has to spin up a new execution context because the old one is gone (due to inactivity of some time, usually ~10-15 minutes in my experience?). In this case, initialization is fast (not much is done before the handler if you check the root index.js for serverless-next.js. Loading dependencies for the page is done in the handler.). But if dependencies were previously loaded in the same execution context, now that execution context is gone, so it needs to be loaded again.

Maybe it's just semantics, but either way, yes I do agree having to hard reload dependencies is the cause.

Ref: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html

I think for the serverless-trace target, node_modules is larger by design. There is only one node_modules for all pages. Say page 1 has dependency A and page 2 has dependency B, then node_modules includes both dependencies A and B. For the serverless target, the JS file for page 1 includes dependency A (but not B), and the JS file for page 2 includes B (but not A).

Yes I agree, but in my single page example where there is exactly 1 page, using serverless-trace, it was still larger by almost an order of magnitude (uncompressed 22 MB vs. 2.9 MB). So I don't believe it is being optimized correctly i.e you are loading more code than required to run your page. The serverless target from Next.js seems to be doing so, but optimizing per page (via webpack), so the overall package size is much larger. But when invoking a page, you are still only loading an optimized set of dependencies for a single page JS file, so it was faster for me.

FAB was also optimizing, although it started from serverless target and I believe it does so by merging the generated webpack outputs into one file and having custom logic to remove duplicate dependency code. In fact, I think FAB is the most optimized since it requires its single server.js before handler code (vs. in the handler code with this plugin) i.e it gets the benefits of better init caching, burst CPU, and also it is not billed. I've attached an example FAB lambda package, you can see how it works yourself:

fab-lambda-package.zip

I also use the serverless-wsgi (Python) package and it similarly does a lot of initialization outside of the handler to try to reduce the impact of cold starts. Ex: https://github.com/logandk/serverless-wsgi/blob/master/wsgi_handler.py#L118-L120

@dphang Do you know any tools/packages do such instrumentalization? I'm still stuck with those very high response times :/

I haven't done so myself, I am still fairly less experienced in JS so I need to do more research on that. I know there are webpack bundle analyzers that can probably be used with the outputs of serverless-trace (but that's already optimized, and this is also a static way to just see which dependencies are large).

asterikx commented 4 years ago

I did some more analysis with njstrace in severless-trace target. Tracing/logging of course adds to those numbers:

6074ms _app.tsx
3901ms header.tsx 
136ms  react-icons
343ms  i18n.ts
360ms  react-tracking
1555ms @chakra-ui/core
45ms   _document.ts
41ms   _document.ts (from next)
220ms  next/dist/next-server/server/render
376ms  log.js
379ms  load-env-config.js
346ms  WEBPACK VAR INJECTION <-- startup() is called several times, each execution ~350ms
79ms   next/dist/next-server/server/node-polyfill-fetch

I wonder if we could remove the dynamic require call and, instead, generate the default-handler with regular top-level imports. The generated handler would need to be compiled with webpack. This way we can get rid of the require call and its penalty (completely?).

I appreciate any feedback or suggestions.

danielcondemarin commented 4 years ago

I wonder if we could remove the dynamic require call and, instead, generate the default-handler with regular top-level imports. The generated handler would need to be compiled with webpack. This way we can get rid of the require call and its penalty (completely?).

I'm afraid that is a tough problem. The issue is the page that needs to be require'd depends on the event object received, which is only available when the handler is invoked so that makes it impossible to use a top level import.

The only way something like that could work was to have a dedicated CloudFront cache behaviour for a specific SSR page. That way you can predict which page needs to be required and write it as a top level import. That's actually how the serverless-plugin works.

asterikx commented 4 years ago

Wouldn't sth. like this work?

import indexPage from "./pages/index"
import helloPage from "./pages/hello"
import errorPage from "./pages/_error"

let page: NextPage;
switch (pagePath) {
  case "/":
    page = indexPage;
    break;
  case "/index":
    page = indexPage;
    break;
  case "/hello":
    page = helloPage;
    break;
  default: page = errorPage
}

const { req, res, responsePromise } = lambdaAtEdgeCompat(event.Records[0].cf);
await page.render(req, res);

danielcondemarin commented 4 years ago

Wouldn't sth. like this work?


import indexPage from "./pages/index"

import helloPage from "./pages/hello"

import errorPage from "./pages/_error"


let page: NextPage;

switch (pagePath) {

  case "/":

    page = indexPage;

    break;

  case "/index":

    page = indexPage;

    break;

  case "/hello":

    page = helloPage;

    break;

  default: page = errorPage

}

const { req, res, responsePromise } = lambdaAtEdgeCompat(event.Records[0].cf);

await page.render(req, res);

If doing that you'd be penalising every request to any page with a lot of imports that are not needed. Whilst they may perform slightly better than a dynamic require I think there is still overhead in having top level imports, imagine an app with 20+ ssr pages how much latency that would introduce.

dphang commented 4 years ago

If doing that you'd be penalising every request to any page with a lot of imports that are not needed. Whilst they may perform slightly better than a dynamic require I think there is still overhead in having top level imports, imagine an app with 20+ ssr pages how much latency that would introduce.

I think FAB is doing this, it requires its server.js file before the handler, maybe they do it since the overhead with many pages is offset by the benefits of putting it before handler code (e.g apparently you get burst CPU/memory for init code, and it might be persisted better across Lambda invocations)?

@asterikx also make sure you are not importing unnecessary code. Just looked back at my nextjs-repros code, for example, in my index.js page, I have this import from MaterialUI:

import {Typography} from "@material-ui/core";

With serverless, this ends up just importing Typography within same page file, as I guess NextJS bundles and tree-shakes correctly:

// EXTERNAL MODULE: ./node_modules/@material-ui/core/esm/Typography/Typography.js
var Typography = __webpack_require__("ofer");

"ofer" above looks to just contain the code for Typography.

With serverless-trace this causes node_modules to contain all components in @material-ui/core if you do this in any file. And then if you use this type of import, this ends up importing entire @material-ui/core which I saw in the default handler's node_modules has 129 components, most of which is unused:

/* harmony import */ var _material_ui_core__WEBPACK_IMPORTED_MODULE_3__ = __webpack_require__("KKbo");
/* harmony import */ var _material_ui_core__WEBPACK_IMPORTED_MODULE_3___default = /*#__PURE__*/__webpack_require__.n(_material_ui_core__WEBPACK_IMPORTED_MODULE_3__);

...

/***/ "KKbo":
/***/ (function(module, exports) {

module.exports = require("@material-ui/core");

...

  return __jsx(react__WEBPACK_IMPORTED_MODULE_0___default.a.Fragment, null, __jsx(_material_ui_core_Button__WEBPACK_IMPORTED_MODULE_1___default.a, {
    variant: "contained",
    color: "primary",
    "data-cy": "button",
    onClick: handleClickButton
  }, "Test Button"), __jsx(_material_ui_core__WEBPACK_IMPORTED_MODULE_3__["Typography"], null, response));

/***/ }),

Here's an article explaining cold starts due to this (AWS SDK as example): https://theburningmonk.com/2019/03/just-how-expensive-is-the-full-aws-sdk/

As I suspected, node_modules is not optimized and neither are the requires in the pages (the developer can manually fix, but it will be nice to programmatically optimize this).

asterikx commented 4 years ago

Thanks for sharing @dphang.

I honestly do not know where to go from here. Chakra seems to have a big impact in my case. But even without Chakra, the response times are far too high. After all, one main benefit of SSR should be performance ...

dphang commented 4 years ago

Thanks for sharing @dphang.

My linting catches unused imports. I honestly do not know where to go from here. Chakra seems to have a big impact in my case. But even without Chakra, the response times are far too high. After all, one main benefit of SSR should be performance ...

I think linting can't catch this type of issue (you are still using the import, just from a higher level directory), this one needs tree-shaking to remove the unused code.

danielcondemarin commented 4 years ago

@asterikx I'm not familiar to how your app. works but I can say that SSR is not always the best option. Have you considered pre-rendering your pages at build time using getStaticPaths?

asterikx commented 4 years ago

@dphang You're right ;)

@danielcondemarin my app is API heavy, so pre-rendering is mostly not an option. It's a multi-tier application with user/role management and includes project and file organization (similar to DropBox Web) and a marketplace with a review system (similar to Amazon). I have proactively moved from CRA to Next.js for SEO (which will be important for the marketplace). But right now I'm wondering if I should go back to CRA and rely on Google crawlers to execute JS ...

dphang commented 4 years ago

@asterikx I'm not familiar to how your app. works but I can say that SSR is not always the best option. Have you considered pre-rendering your pages at build time using getStaticPaths?

+1 on this. For me, I now only have two SSR pages which are not very common pages for most users to access via SSR (~15 other SSG pages).

@asterikx also as a quick test you can try to deploy using FAB, if the cold start is still too high with all their optimizations, then I don't think optimizing it here will help much either.

For me I get around 2 seconds on cold start or so which is acceptable for now, I can also accept stale data for some amount of time on these pages, so I'm planning to cache on CloudFront for ~60 seconds as well.

asterikx commented 4 years ago

For some reason fab:build fails.

Build failed.
null
./.fab/.cache/generated-nextjs-renderers.c24b62b.js
Module not found: Error: Can't resolve 'child_process' in '/Users/erikmuller/Documents/Detelling/web/.fab/.cache'
resolve 'child_process' in '/Users/erikmuller/Documents/Detelling/web/.fab/.cache'
  Parsed request is a module
  using description file: /Users/erikmuller/Documents/Detelling/web/package.json (relative path: ./.fab/.cache)
    Field 'browser' doesn't contain a valid alias configuration

But, since FAB is using serverless target, bundle size will likely be an issue again in my case:

Can't use that unfortunately due to the hard 50 MB deployment package limit. I have ~20 routes and each JS file had ~12-15 MB using the serverless target.

I fear Lambda@Edge is simply not able to deliver the performance I expected. I'm considering to give Cloudflare workers a try, or just go back to CRA for now.

Thank you very much @danielcondemarin @dphang for your help :)

danielcondemarin commented 4 years ago

I fear Lambda@Edge is simply not able to deliver the performance I expected. I'm considering to give Cloudflare workers a try, or just go back to CRA for now.

I wouldn't put it on Lambda@Edge though, doesn't matter how good the runtime is if the dependencies in play aren't being efficient enough. In terms of cold starts I forgot to mention AWS announced https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/ which is a game changer. It's still not supported by Lambda@Edge but may be at some point 🤞 I'm going to close this for now and good luck with everything. Feel free to open a new issue if you stick with next and this project and need further help 👍🏻

dphang commented 4 years ago

For some reason fab:build fails.
Build failed.
null
./.fab/.cache/generated-nextjs-renderers.c24b62b.js
Module not found: Error: Can't resolve 'child_process' in '/Users/erikmuller/Documents/Detelling/web/.fab/.cache'
resolve 'child_process' in '/Users/erikmuller/Documents/Detelling/web/.fab/.cache'
  Parsed request is a module
  using description file: /Users/erikmuller/Documents/Detelling/web/package.json (relative path: ./.fab/.cache)
    Field 'browser' doesn't contain a valid alias configuration
But, since FAB is using serverless target, bundle size will likely be an issue again in my case:

Can't use that unfortunately due to the hard 50 MB deployment package limit. I have ~20 routes and each JS file had ~12-15 MB using the serverless target.

I fear Lambda@Edge is simply not able to deliver the performance I expected. I'm considering to give Cloudflare workers a try, or just go back to CRA for now.

@asterikx hmm, it might be an issue with your code? Maybe this will help: https://stackoverflow.com/questions/43037590/field-browser-doesnt-contain-a-valid-alias-configuration?

Actually FAB does merge webpack outputs (i.e each page) from serverless output. For me it reduced bundle size to ~2.7 MB compressed (9.5 MB using experimental-serverless-trace, and 31 MB using serverless on this plugin) and was on par with serverless target perf (best perf). Again, this is because it merges webpack outputs and reduces duplicate code (it's already tree-shaken per page by next's serverless target, I think this just ensures there are no duplicate dependency code) into one file. I think you will be having ~16-20 MB uncompressed total size using FAB, which is totally fine for Lambda@Edge.

Also Cloudflare workers has a limit of 1 MB compressed on scripts and they don't use Node.js runtime (they use V8 Isolates), so you may have issues even deploying (and other issues with dependencies that expect Node.js runtime), so it won't work for you (until they raise that limit).

FWIW, I'm not affiliated with FAB at all. I would rather use this plugin when possible as I love the Serverless Framework and am heavily on AWS, but that can serve as a quick test for your needs. Though since both this and FAB are open source, it's nice to be able to learn and improve from others.

@danielcondemarin I don't think this issue is solved on the serverless-next.js end, as this plugin is not optimizing/tree-shaking the Lambda code (pages + node_modules) using the serverless-trace target. Though I suppose we can open a separate feature issue for this. Happy to help with any design discussion or even some coding (though I'm not a JS expert).

danielcondemarin commented 4 years ago

@danielcondemarin I don't think this issue is solved on the serverless-next.js end, as this plugin is not optimizing the Lambda code (pages + node_modules) using the serverless-trace target. Though I suppose we can open a separate feature issue for this. Happy to help with any design discussion or even some coding (though I'm not a JS expert).

App build concerns are outside the scope of this project. I think is best to submit this issue on the Next.js repo, maybe start as a discussion? The core of the serverless-trace target lives in https://github.com/vercel/nft and is the same one as Vercel uses. Basically by fixing it there everyone benefits.

dphang commented 4 years ago

@danielcondemarin I don't think this issue is solved on the serverless-next.js end, as this plugin is not optimizing the Lambda code (pages + node_modules) using the serverless-trace target. Though I suppose we can open a separate feature issue for this. Happy to help with any design discussion or even some coding (though I'm not a JS expert).

App build concerns are outside the scope of this project. I think is best to submit this issue on the Next.js repo, maybe start as a discussion? The core of the serverless-trace target lives in https://github.com/vercel/nft and is the same one as Vercel uses.

Basically by fixing it there everyone benefits.

I partly disagree as this plugin is targeting AWS infra (right now) with this Serverless component, so I feel we should at least somewhat concerned with perf optimizations within the context of Lambda, especially since you mention this target in the README. Perhaps add a disclaimer there? But let's agree to disagree.

Anyway, I do not mean to sound contentious, sorry if I come across like that. I'll first create a discussion in Vercel's repo. It looks like it could be a problem on Vercel deployments too, hopefully that's enough motivation for them to consider it.

Thanks for your help!

danielcondemarin commented 4 years ago

I partly disagree as this plugin is targeting AWS infra (right now) with this Serverless component, so I feel we should at least somewhat concerned with perf optimizations within the context of Lambda, especially since you mention this target in the README

I do agree with this point and if you can think of any perf. optimisation in the context of Lambda@Edge I'd love to hear! What I'd like to avoid is diverging too much from Next.js build output / Vercel deployments.

As an aside there is an optimisation that Vercel does by compressing the node_module files. I don't see how that would make much difference in terms of execution times but may be something to try out if you're up for it!

dphang commented 4 years ago

I partly disagree as this plugin is targeting AWS infra (right now) with this Serverless component, so I feel we should at least somewhat concerned with perf optimizations within the context of Lambda, especially since you mention this target in the README

I do agree with this point and if you can think of any perf. optimisation in the context of Lambda@Edge I'd love to hear! What I'd like to avoid is diverging too much from Next.js build output / Vercel deployments.

As an aside there is an optimisation that Vercel does by compressing the node_module files. I don't see how that would make much difference in terms of execution times but may be something to try out if you're up for it!

Yup, totally agree there, we should not have to mess with build outputs too much. I've created a discussion here: https://github.com/vercel/next.js/discussions/16276

alan345 commented 4 years ago

Hi @asterikx I'am experiencing the same issue. Sometimes my page take 7 seconds to load, sometimes 2 seconds. I cannot figure out why.

Based on this conversation, can you give me your feedback on how did you deepdive? which tool did you use to get some analytics? finding the bottleneck? Are you still using nextJs/Lambda?

Thanks!

alan345 commented 4 years ago

For testing purpose, I also deployed on an EC2 instance and compare both websites: Light blue: EC2 Dark Blue: Lambda

Source: https://www.dareboost.com/

source: https://www.site24x7.com/

dphang commented 4 years ago

@alan345 if you have bad performance, it can be caused by cold start, which is mostly due to requiring Node.js modules. For AWS Lambda@Edge, there seems to be a minimum of 150-200 ms in my tests (for just initializing the container, even on a hello-world style function, it doesn't depend on memory of the Lambda function). Then the rest is requiring your dependencies. Code size may affect it slightly as well, but it's mostly negligible in my experience (it may add a couple ms but that's not the bulk of it).

We have tried to optimize it as much as possible for the handler code itself through lazy-loading and using very light modules (e.g the new AWS SDK v3 for S3 client). And minified, the handler code generally adds around 30 ms more to the cold start time.

The rest of the time should be due to your dependencies. You can do the following:

Increase memory of your Lambda function. This inherently increases the container's CPU as well, which may help it load modules faster.
Try to check the built JS files to ensure you are not unnecessarily initializing some modules, or try to move it to the browser, if possible. You can use the webpack bundle analyzer from Next.js to help analyze some of it. Or manually add some performance.now() statements in your server JS files to analyze which dependency is taking longest to load.
If you are using serverless-trace target, it causes more file I/O as dependencies and JS files are split up among many small files. Despite being de-duplicated (which helps with Lambda code size), this doesn't help with require time since the container has to do more file operations to load all the dependencies.
Minify your Next.js server-side JS files using a webpack configuration. This is in the README in the FAQ. This could help it slightly as there will be smaller code for the Node.js runtime to parse and load.