Open twodotsmax opened 1 year ago
Caching is something we've been interested in for a long time. I believe @peterp implemented a custom solution at Snaplet. I'll give him a nudge about this.
Thanks in advance @twodotsmax for offering to chip away at next steps!
Just adding that this is also one of my issues largest Redwood issues ATM, a server reload is ~7s which is one of the slowest parts of my entire system
At Artsy we made https://github.com/artsy/express-reloadable (writeup: http://artsy.github.io/blog/2017/12/05/Express-Reloadable-Update/ ) which addressed most of our issues. I don't think it works in an ESM world, but until redwood the whole supports ESM apps the techniques in it could work
Steps for people who need a solution now:
'use strict';
const {
performance,
PerformanceObserver,
} = require('node:perf_hooks');
const mod = require('node:module');
// Monkey patch the require function mod.Module.prototype.require = performance.timerify(mod.Module.prototype.require); require = performance.timerify(require);
// Activate the observer
const obs = new PerformanceObserver((list) => {
const entries = list.getEntries();
entries.forEach((entry) => {
console.log(require('${entry[0]}')
, entry.duration);
});
performance.clearMarks();
performance.clearMeasures();
obs.disconnect();
});
obs.observe({ entryTypes: ['function'], buffered: true });
require('some-module');
6. Postprocess the output so that you only save the first instance of loading a particular module, because the second time will be cached and understate the load time. You can also take the max load time.
7. Sort the module loading data by load time to find your most expensive require() calls.
8. If the expensive module is not necessary to load on startup, replace the import with a require() that does not run at module load time. For example, if you use a library that makes a remote API call, don't require() it at the top of the file. Instead require it when the API call is made, and it will be cached the second time. This will block the event loop and you may need to conditionally perform this lazy load only in development. If it is necessary on startup but isn't needed for development or needed 100% of the time, you can conditionally require it.
Interesting (and intense!), from my perspective ~50% of the latency I am seeing is framework related:
I guess you're seeing that middle phase continue to grow, I guess maybe trimming the middle might be useful for me in the short term. I had wondered if when the vite has stablized whether vite-node might be a solution to base the API dev server on.
@peterp @twodotsmax @orta if we can determine the path forward here, I'm all in to prioritize the effort. Keep me posted
I've gone down this path a few times and modifying the require cache never feels robust: The main reason is that live-reloaded files, by using the require cache, has unexpected consequences because it's additive. I called this "spooky reloads."
The problem is best illustrated by example code:
Step 1: The development server loads server.ts
, adds myFirstFunction
to the register and executes myFirstFunction()
+ const myFirstFunction = () => {
+ console.log('called `myFirstFunction`')
+ }
+ myFirstFunction()
Step 2: The user modifies server.ts
, removes myFirstFunction
, adds mySecondFunction
, but mistakenly executes myFirstFunction
.
- const myFirstFunction = () => {
- console.log('called `myFirstFunction`')
- }
- myFirstFunction()
+ const mySecondFunction = () => {
+ console.log('called `mySecondFunction`)
+ }
+ myFirstFunction()
Even though the user deleted myFirstFunction
from their code, we reloaded the file and added the second function, but didn't remove the first function, so the result is that myFirstFunction
still executes.
The list of problems that this sort of reloading mechanism can introduce is vast and each time a user experiences them it can feel like they're loosing their mind because, as developers, we expect the file on our filesystem to match our expectations in the runtime.
Just to clarify the comment I originally left in the code: // TODO: Use v8 caching to load these crazy fast.
v8 has a mechanism to extract and restore the bytecode that's exposed in NodeJS:
Code caching (also known as bytecode caching) is an important optimization in browsers. It reduces the start-up time of commonly visited websites by caching the result of parsing + compilation.
The idea was to have the "build-server" save the bytecode to disk, and then to only use the saved bytecode in the "dev-server." The hope was that this would improve start-up time, and to avoid the above mentioned issues with invalidating the require cache.
This concept is completely untested. It may be an easy performance win, but it might not be. As far as I know Next uses a form of hot-module-reloading.
Alternative approaches to writing our own code that could be a short-term win:
1.~Snapshots may be a very quick 'n dirty win for the dev-server. We just add these flags to the start-up: build and restore~
v8-compile-cache
: which attaches a require hook to use V8's code cache to speed up instantiation time. The "code cache" is the work of parsing and compiling done by V8.Edit: Removed option 1 since it wouldn't allow us to inject new code.
As an aside I wanted to point out that the lazy loading technique which we used in the CLI could also help you figure out why cold starts are slow (which modules are taking the most time) and then you could import them lazily.
This RFC shows how we figured out what was taking long to import, and how we improved it: https://github.com/redwoodjs/redwood/issues/6027
Happy to do a video on 0x
if someone needs additional guidance.
@twodotsmax @orta checking in about possible interest + availability to dig into this. Understood either way. Just didn't want to miss the chance if either of you is interested.
Summary
Hot reloading of the API service is too slow for large projects.
Motivation
David left this comment in lambdaLoader.ts 8 months ago:
// TODO: Use v8 caching to load these crazy fast.
As our project scaled, the load time for the /graphql function went to 12 seconds, which actually started to slow down development because when the service code changes, every function must be reloaded.
A CPU profile pointed to lots of time spent in "require", and using the perf-hooks "timerify" function, we were able to find the slowest to load modules and skip/lazy-load them during development.
Detailed proposal
I am willing to do more investigation, but wanted to hear from the team first about whether v8 module caching can solve this if used correctly.
Are you interested in working on this?