Open zepumph opened 6 months ago
Here is the how to get the current list of repos with no ts files in their js directories:
assert
babel
binder
community
decaf
fenster
monday
phet-info
phet-io-client-guides
phet-io-website
phet-io-wrapper-classroom-activity
phet-io-wrapper-haptics
phet-io-wrapper-hookes-law-energy
phet-io-wrapper-lab-book
phet-lib
phettest
qa
quake
query-string-machine
rosetta
scenery-lab-demo
sherpa
skiffle
tasks
weddell
yotta
I tried something like this, but it seems to take a lot of memory, likely everyone would need to give node 5GB via command line options to get this to work. We should investigate more.
When I try to include all sims that should be getting checking, this error occurs when Node is taking up ~4.9GB of memory in the tsc process:
mjkauzmann ~/PHET/git/chipper/tsconfig/all (main)
$ time tsc
C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:114743
throw e;
^
Error: Debug Failure.
at visitEachChildOfIndexSignatureDeclaration (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:83740:13)
at visitEachChild (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:83596:33)
at visitExistingNodeTreeSymbols (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:49006:16)
at visitArrayWorker (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:83408:49)
at visitNodes2 (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:83379:19)
at visitEachChildOfTypeLiteralNode (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:83786:7)
at visitEachChild (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:83596:33)
at visitExistingNodeTreeSymbols (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:49006:16)
at visitArrayWorker (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:83408:49)
at visitNodes2 (C:\Users\mjkauzmann\PHET\git\chipper\node_modules\typescript\lib\tsc.js:83379:19)
Node.js v18.16.0
real 3m7.755s
user 0m0.000s
sys 0m0.000s
I'd like to discuss with @samreid before proceeding.
@zepumph and I found that we can only get to around 100 simulation entry points before crashing. It seems the solution is to use project references, probably like described in https://github.com/phetsims/chipper/issues/1356.
Compared to where we are now, this means we could add another 30-60 entry points at most before we cannot run it any more.
I believe this was helpful in investigating the EOL on the monolithic approach. Likely one day we will hit this, and linting will no longer work. Let's up the priority of https://github.com/phetsims/chipper/issues/1356 so that we use project references BEFORE that happens. I bet we have 6 months of confident grace, and ~1 year before we start getting quite risky. Thanks for your time @samreid.
eslint's project uses the exact same code path as running tsc, at least as far as getting the heap memory error. If I use a generalized "include everything" config for eslint, it still runs out of memory:
{
"extends": "../../tsconfig-core.json",
"include": [
"../../../**/*"
],
"exclude": [
"**/node_modules",
"**/.git"
]
}
Some experimental results:
grunt lint-everything
caused a memory crash overrides: [
{
files: [ '*.html' ],
rules: {
// DUPLICATION ALERT, this overrides the base rule, just for HTML.
'no-multiple-empty-lines': [ 'error', { max: 2, maxBOF: 2, maxEOF: 1 } ],
'bad-sim-text': 'off'
}
},
{
// For .ts files, the following configuration will be used
files: [
'**/*.ts',
'**/*.tsx'
],
parser: '@typescript-eslint/parser',
parserOptions: {
sourceType: 'module',
// Provide a tsconfig so that we can use rules that require type information.
// NOTE: Providing this slows down eslint substantially, see https://github.com/phetsims/chipper/issues/1114#issuecomment-1065927717
project: [ `../${global.myFancyRepo}/tsconfig.json` ]
},
And it was still a heap crash.
lintOneRepo
in a separate process via fork
and that ran smoothly and kept under the memory limit for grunt lint-everything
.tsc
in tsconfig/all failed with an out of memory error:~/phet/root/chipper/tsconfig/all$ tsc
<--- Last few GCs --->
[51248:0x128008000] 75368 ms: Mark-Compact 4040.1 (4140.6) -> 4027.0 (4143.4) MB, 661.92 / 0.00 ms (average mu = 0.525, current mu = 0.063) allocation failure; scavenge might not succeed
[51248:0x128008000] 76772 ms: Mark-Compact 4043.3 (4143.6) -> 4028.8 (4145.1) MB, 1366.46 / 0.04 ms (average mu = 0.271, current mu = 0.027) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
1: 0x100340bf4 node::Abort() [/usr/local/bin/node]
2: 0x100340ddc node::ModifyCodeGenerationFromStrings(v8::Local<v8::Context>, v8::Local<v8::Value>, bool) [/usr/local/bin/node]
3: 0x1004c4da8 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/usr/local/bin/node]
etc...
Changing the package.json to point to the tsconfig could make this work. However, in doing so, the IDE catches errors but grunt lint does not.
Experimental patch:
It was OK after a memory boost though:
~/phet/root/chipper/tsconfig/all$ node --max-old-space-size=8192 /Users/samreid/phet/root/chipper/node_modules/typescript/bin/tsc
~/phet/root/chipper/tsconfig/all$
In discussion, @matthew-blackman @zepumph @jonathanolson and I discussed that we want to move forward with the package.json lint entry points, and I will take the lead on that. That will help with linting. In https://github.com/phetsims/chipper/issues/1356 we talked about a long term goal of having d.ts files, so that will help with the type checking everything maybe.
Also noting a link to our discussion: https://docs.google.com/document/d/11Gt3Ulalj0fCD2fFeCjPT5ni_9mM2WjkGc4ysisQmo8/edit#heading=h.cpfo2356at26
I tracked down memory leaks to typescript-eslint and the eslint project itself. First, we are suffering from the same problem as: https://github.com/typescript-eslint/typescript-eslint/issues/6462. Please note also the flag process.env.TSESTREE_SINGLE_RUN = 'true'
which will probably be helpful for our case.
I tried the workaround listed there and it seemed to help. Instead of memory failing at 15% repos, it makes it to 35% repos.
Here is that patch so far:
Note this is clear-caches.js from typescript-eslint:
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.clearProgramCache = exports.clearCaches = void 0;
const getWatchProgramsForProjects_1 = require("./create-program/getWatchProgramsForProjects");
const parser_1 = require("./parser");
const createParseSettings_1 = require("./parseSettings/createParseSettings");
const resolveProjectList_1 = require("./parseSettings/resolveProjectList");
/**
* Clears all of the internal caches.
* Generally you shouldn't need or want to use this.
* Examples of intended uses:
* - In tests to reset parser state to keep tests isolated.
* - In custom lint tooling that iteratively lints one project at a time to prevent OOMs.
*/
function clearCaches() {
(0, parser_1.clearProgramCache)();
(0, getWatchProgramsForProjects_1.clearWatchCaches)();
(0, createParseSettings_1.clearTSConfigMatchCache)();
(0, createParseSettings_1.clearTSServerProjectService)();
(0, resolveProjectList_1.clearGlobCache)();
}
exports.clearCaches = clearCaches;
// TODO - delete this in next major
exports.clearProgramCache = clearCaches;
//# sourceMappingURL=clear-caches.js.map
There are more memory leaks after that appear to be related to eslint itself. Here is a screenshot of that memory usage:
I do not see an API for clearing these in-memory caches. Therefore it seems reasonable to experiment with running the lintOneFile in a separate fork or process, since we can guarantee that won't leak any internal memory caches.
Please be advised that fork/process overhead on Windows is very different than on mac, so please performance test on Windows before getting too far.
Patch that introduces lintMain and batches in lint-everything. I'll probably need help getting this to production.
In the last few days, CT started failing lint-everything in every column due to this out of memory error:
perennial : lint-everything
Lint-everything failed with status code null:
Running "lint-everything" task
<--- Last few GCs --->
[1502318:0x6def2e0] 687205 ms: Scavenge 3903.2 (4125.5) -> 3892.7 (4127.3) MB, 30.7 / 0.0 ms (average mu = 0.317, current mu = 0.307) allocation failure;
[1502318:0x6def2e0] 687338 ms: Scavenge 3907.8 (4127.3) -> 3898.1 (4127.8) MB, 37.0 / 0.0 ms (average mu = 0.317, current mu = 0.307) allocation failure;
[1502318:0x6def2e0] 687454 ms: Scavenge 3911.7 (4127.9) -> 3902.7 (4128.7) MB, 26.2 / 0.0 ms (average mu = 0.317, current mu = 0.307) allocation failure;
<--- JS stacktrace --->
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0xb7a940 node::Abort() [grunt]
2: 0xa8e823 [grunt]
3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [grunt]
4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [grunt]
5: 0xf3a3e5 [grunt]
6: 0xf3b2e8 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [grunt]
7: 0xf4b7f3 [grunt]
8: 0xf4c668 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [grunt]
9: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [grunt]
10: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [grunt]
11: 0xf0956a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [grunt]
12: 0x12ce7af v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [grunt]
13: 0x16fb6b9 [grunt]
Snapshot from 3/19/2024, 3:04:21 PM
@zepumph and I wrote this worker thread implementation:
lintMain.js
const lint = require( './lint' );
const { parentPort } = require( 'worker_threads' );
parentPort.on( 'message', async ( { repo, options } ) => {
const result = await lint.lintOneRepo( repo, options );
parentPort.postMessage( result );
process.exit( 0 );
} );
@samreid and I got to a couple of commit points today in the workerLint branch in chipper. This uses workers to chunk up repos and lint them individually.
@samreid and I will continue again soon!
grunt lint --fix
grunt lint --fix
on a typescript-specific rulegrunt lint-all
grunt lint-everything
grunt
buildsTest patch to change all package.json files from the all config:
Patch with Promise.race etc:
@zepumph and I worked on this and committed a change that runs lint in a worker thread. This allows us to keep under a lower memory limit and make sure to reclaim that memory when the worker exits.
We were surprised to see that using focused tsconfig files from the lint slowed down the results a lot when running on a batch of repos (compared to the monolithic tsconfig). See https://github.com/phetsims/chipper/issues/1425
Some experimental results:
We hacked inferSingleRun since our value for TSESTREE_SINGLE_RUN='true' was not getting respected. But that did not give any evidence of a boost. But there are more questions because inferSingleRun depends on project and program which are not yet defined.
In further experimentation, @zepumph and I saw that:
process.env.TSESTREE_SINGLE_RUN = 'true'
const { clearProgramCache } = require( '../../../chipper/node_modules/@typescript-eslint/typescript-estree/dist/index.js' );
clearProgramCache();
Both of these seem to have wrecked the performance.
From https://github.com/typescript-eslint/typescript-eslint/issues/6462#issuecomment-1432331528
Up next:
I believe that https://github.com/phetsims/chipper/issues/1429 is the way of the future. Let's see how that goes next week and come back here to ensure that the memory leak is fixed by using npx.
As we continue to run into this memory error, both in lint and tsc-all, we should note that you can get around the issue with a node option:
--max-old-space-size=8192
is a good place to start. It can be added as a CLI option to node
, or put in the bashrc like:
export NODE_OPTIONS=--max_old_space_size=8192
Let's also put this on hold until https://github.com/phetsims/chipper/issues/1431 is complete. (in addition to https://github.com/phetsims/chipper/issues/1429).
We originally wrote this as an opt in because we didn't want to bog down type checking and lint with repos that didn't have typescript in them. Now, that is more of the exception than the rule. Perhaps we could use
exclude
patterns such that we never needed to update this file. This would help prevent errors like https://github.com/phetsims/perennial/issues/347#issuecomment-1877847143