Closed pauldraper closed 2 years ago
Some backstory in #23339.
Interesting reading, thanks.
TypeScript [2.9.0] has doubled in size since v2.0.0 - now 35 MB
It was "fixed" by #25901, released in 3.1.1, which was 40MB. :slightly_frowning_face:
It won't be hard at all to shrink the package size. For example, lib/tsserver.js
and lib/tsserverlibrary.js
are 98% identical.
$ du -b node_modules/typescript/lib/tsserver.js node_modules/typescript/lib/tsserverlibrary.js
7290127 node_modules/typescript/lib/tsserver.js
7308140 node_modules/typescript/lib/tsserverlibrary.js
$ comm -12 <(sort node_modules/typescript/lib/tsserver.js) <(sort node_modules/typescript/lib/tsserverlibrary.js) | wc -c
7207205
And 99% of lib/typescript.js
is identical to those.
$ du -b node_modules/typescript/lib/typescript.js
6859801 node_modules/typescript/lib/typescript.js
$ comm -12 <(sort node_modules/typescript/lib/tsserver.js) <(sort node_modules/typescript/lib/typescript.js) | wc -c
6850490
And lib/typescriptServices.js
is byte-for-byte identical to that.
$ sha1sum node_modules/typescript/lib/typescript.js node_modules/typescript/lib/typescriptServices.js
0cff9734eba3d721a7ba3c72026e16f267610e24 node_modules/typescript/lib/typescript.js
0cff9734eba3d721a7ba3c72026e16f267610e24 node_modules/typescript/lib/typescriptServices.js
And 99% of lib/typingsInstaller.js
is identical to that.
$ du -b node_modules/typescript/lib/typingsInstaller.js
5285788 node_modules/typescript/lib/typingsInstaller.js
$ comm -12 <(sort node_modules/typescript/lib/typingsInstaller.js) <(sort node_modules/typescript/lib/typescriptServices.js) | wc -c
5246999
And 80% of lib/tsc.js
is identical to that
$ du -b node_modules/typescript/lib/tsc.js
3912404 node_modules/typescript/lib/tsc.js
$ comm -12 <(sort node_modules/typescript/lib/typingsInstaller.js) <(sort node_modules/typescript/lib/tsc.js) | wc -c
3219205
That's nearly 30MB of duplication just in those few files (and this doesn't even include declaration files).
I can't begin to guess at the kinds of design decisions that produce this (or what kind of compatibilities the TS team needs to support), but I trust there is a solution the maintainers would be happy with.
I can't begin to guess at the kinds of design decisions that produce this (or what kind of compatibilities the TS team needs to support), but I trust there is a solution the maintainers would be happy with.
It's done this way so that every file can be used by itself without having to deal with the nastiness of modules in JavaScript. Every file is a functional library/program in itself. I think that is a great thing, at the cost of some disk space.
Reading the linked issue #23339, it appears that it desire is in fact to (eventually) use modules.
https://github.com/Microsoft/TypeScript/issues/23339#issuecomment-380632662
If we used modules, we'd be able to share each file and avoid this duplication
it is something we want to do, but no plans for the short term. that is where the majority of savings would come from.
nastiness of modules in JavaScript
ES module systems in general can be hit-and-miss, but reminder that we're talking specifically about an npm package.
npm, npm packages, node_modules, package.json, etc. are relate to Node.js (or clones) which supports CommonJS. Right?
I have two ideas, but I am not sure which one is better.
split code in the source code for now, some common utils or helper has been shared with a different component, we could split them by function, eg: utils.ts -> utils.ts( common), utils.factory.ts(depend on factory), utils.emitter.ts(depend on emitter), etc. if you want a factory or emitter only. just create a tsconfig.json file that include the depended file,
analyze and transform the bundled file
the namespace has been compiled to many iife and injected the namespace instance,
we could compile with target esnext and merge those iife, then transform the ts.xxx = xxx
to export xxx
,
and then, we could pack them as a normal esm project and tree shark
ping @DanielRosenwasser What do you think about that🧐?
I am skeptical that tree-shaking is useful for shipping our own package because presumably everything we ship is used in some capacity, or is part of our public API - at which point, our consumers would actually be the ones winning from tree-shaking.
Splitting source on its own can help, but practically speaking the larger components like services and TSServer will need the entire core compiler.
I think that converting to modules is the most practical and obvious way to avoid duplicating most of the contents of tsc.js
3+ times.
A simpler solution: inspired from Busybox.
Combine N near-duplicate files into 1 polymorphic file that can do N things based on a parameter passed in.
It would introduce a performance overhead of parsing tiny % of unnecessary JS code, but can make the tool integration story way simpler. Maybe worth it?
One trivial way to know which feature is expected would be to directly copy Busybox approach: symlink all the duplicate files and differentiate at runtime based on the __filename
. Saves disk space, package size, bandwidth. There are more interesting options too.
From speaking with @RyanCavanaugh, it sounded like @orta was interested in working on this.
+1 for splitting up typescript into multiple packages. One major benefit would be that these individual packages (other than the "typescript" package) could use semantic versioning on at least their APIs then other libraries could just depend on the packages they need. Right now it's kind of a pain to maintain a library that has a peer dependency on the typescript package (without being super strict about the supported version).
Yeah, I'm chatting with folks internally this week, but my goal is roughly:
typescript
be the same as right now (as removing things would break the world) which provides all toolingThen have subset packages which are smaller and focused on a specific task:
@typescript/tsc
for folks who are just doing compilation (e.g. tsc compiles on the server, prettier for the AST)@typescript/services
for folks building dev tools like monaco-typescript, or executeprogram etcI doubt I can offer any useful semver on them, as they link to the main TS version. That'd need the API to actually be classed as "stable" which doesn't look like that's happening soon.
Figuring out how/if we can reduce the main "typescript"
is hopefully something I can get an idea about during ^
Removing tools from the package doesn't reduce overall size. Compilation, dev tools etc reuse a lot of the same code that is now copied to multiple commands without changes. The issue is how to share the very duplicated part between the tools, reduce the duplication, or pack the tools into one bundle.
Yeah, I'm chatting with folks internally this week, but my goal is roughly
Oh, we're generally for it (and have been for years, provided we still provide a services bundle for our (browser) consumers who use it) - we just need an automated way to remap the current namespace-based code layout into modules, this way we can keep a PR doing the migration up to date and not stop development on other things. I have a branch from two years ago that migrated all of src/compiler
to modules (by hand) - checker.ts
had something like 100 lines of imports on it. And that took quite awhile to make. That gave some of us some pause and reduced enthusiasm, but... I'm hoping the final result is still seen as worth it.
With respect to said automation, I think we could probably write a kind of codemod for it using the APIs we have today, but nobody's put in the effort yet.
@orta VS Code is very interested in this work. Right now we consume TypeScript in two ways:
tsserver.js
— Used by our JS/TS extension typescript.js
— Used by our html extensionEach of those files is around 8MB on disk. Additionally, are interested in shipping built-in support for tsc (tsc.js
), but that's another 4.5MB and that's difficult for me to justify. It seems to me like all these various TypeScript components should be able to share a lot of code.
Let me know if you would like any additional info about how VS Code consumes TS
As a side note, typingsInstaller.js
is pretty huge too (6MB)!! Does it pull in a lot of stuff from TS core?
I brought this up during the most recent design meeting - https://github.com/microsoft/TypeScript/issues/34899
Where the end result was basically, we're meeting about trying to get modules happening again
As mentioned above - all of these files are basically the same but with a bit of flavor difference because they represent different sets of the compiler + services - for example I think you can probably use tsserverlibrary
for both the html + JS/TS cases in vscode, buttsc.js
doesn't look like it lives in there.
https://github.com/microsoft/TypeScript/pull/35561 is looking like the answer to this, I'll keep my eye on PR to see how things change
I am skeptical that tree-shaking is useful for shipping our own package because presumably everything we ship is used in some capacity, or is part of our public API
If it's not much effort to add into the build, this could still be a worthy goal. There are a few consumers, like Prettier and the new VS Code JS debugger extension, who ship TypeScript in a bundled form. It would double the size on disk if you shipped both ESM and CommonJS in a single package--maybe it could be a separate set of /typescript.*-esm/
packages?
@connor4312 You'll be able to give it a shot when it's migrated, I'm just saying to temper expectations about the savings you'll see.
npm install typescript@4.0.2
results in a 60MB node_modules
on my Mac (56MB of which is typescript itself). Typescript is by far the largest module in our stack (and we have 146 explicit deps in package.json) – would love to see some reduction here 🙏
Yup. This is the second largest module in my stack. typescript@4.0.3
is taking up 52M
on disk - while its fine for prod since people typically dont ship typescript as well in images but the transpiled js files, still a reduction in size can impact the dev env significantly.
The install size of typescript@4.5.4
is 61 MB:
However most of that (51.8 MB) is just these six JavaScript files. Minifying them using uglify-js with just basic configuration reduces their size drastically (to 16.5 MB): | File | Size | Minified size |
---|---|---|---|
tsc.js | 5621 kB | 2206 kB | |
tsserver.js | 10378 kB | 3237 kB | |
tsserverlibrary.js | 10331 kB | 3220 kB | |
typescript.js | 9728 kB | 2989 kB | |
typescriptServices.js | 9728 kB | 2989 kB | |
typingsInstaller.js | 7298 kB | 2273 kB |
The resulting package size (25.7 MB) is less than half of the current install size at the cost of one additional build step. Is this maybe something that should be explored? I didn't manage to find any thread discussing this except for one mention in #23339.
@vostrnad They're working on modularizing the compiler. #35561
From an outsiders perspective, it seems there hasn't been much work done on this recently. People from a lot of corners of the typescript universe have chipped in their approval towards a smaller typescript package. I'm no expert on anything low-level, but I'm just chipping in to start the discussion. Could the community be of any resource to this?
I have (and another dev or two before me has) been working on #35210 (turning the TS package into modules, mentioned in this thread before), which would directly impact this by only having one copy of everything in the package (like most npm packages). Then, the package would be smaller, and the lack of namespace generation into single files would allow importers to properly tree shake (allowing consumers to ship less).
Forgive the lack of obvious progress; this work is done out of tree in a code transformer that will do the conversion from namespaces/outFile to modules in bulk, since this sort of thing is far too difficult of a task to do solely by hand (and probably not gradually either).
This is what my work environment github folder looks like; the repeated yellow chunks are all typescript in various node_modules folders.
👋 Inspired by discussions here (especially @vostrnad's observation), I created a smaller redistribution of TypeScript: https://github.com/kidonng/typescript
It's not battle tested though, but I've successfully used it to build the Vite repo.
FYI: We(Prettier) just reduced bundled package size from ~3.5m to ~1.4m by manually remove unused code. https://github.com/prettier/prettier/pull/13431
For those following this thread, I've just posted the PR that converts the codebase to be implemented with modules (#51387); with this change comes major changes to our build and packaging, including a 43% reduction in package size.
I am filing followup issues now that the modules PR has been merged.
One such issue of interest here is #51440; the TL;DR is that if we raise our minimum supported Node version to Node 12, we could safely ship our executables as ESM, which would save us roughly 7 MB more on top of the 43% reduction above.
The reduction from modules is very significant. (Thanks!!!!!!)
If your math is correct, that reduces the package size from 65MB to 36MB.
Which is still larger than it was when #23339 was filed, asking for it to be smaller.
But alas, such is progress.
This was the largest possible improvement to the size. More could be done, but it's not gonna cut in half again.
Eventually, we may be able to ship as ESM and achieve the smallest possible package. Or, go further and publish individual packages for parts of our repo. That goal's a long way off, but there is work left o be done here.
Following the migration to modules in typescript@5.0.0-dev.20221108 , I ran my minification tests again. Using uglify-js on the five largest JavaScript files now reduces the package size from 35.6 MB to 18.0 MB: |
File | Size | Minified size |
---|---|---|---|
tsc.js | 5097 kB | 2281 kB | |
tsserver.js | 7923 kB | 2999 kB | |
tsserverlibrary.js | 7886 kB | 2983 kB | |
typescript.js | 7338 kB | 2705 kB | |
typingsInstaller.js | 1756 kB | 985 kB |
I mentioned minification in the module conversion PR; we are restricted on that front because so many people still patch our package. If we minify, patching becomes difficult to impossible.
I'd love to be able to do so, but we have to figure out what to do about that first.
(We'd also probably not go "full" minify; we need to keep names for backtraces.)
Minify only saves space if you don't include source maps.
And excluding source maps seems like deal-breaker.
We already exclude source maps in the package, but our output is left "pretty" so that stack traces are meaningful when provided by downstream users.
If we were enabling minification, we would likely only have it remove whitespace and optimize syntax, leaving names in the output.
Re: ES Modules, I think we have to take performance as a serious goal. We get a big speed boost from esbuild's whole-program-aware bundling and giving that up for a better sticker number isn't a good trade-off for most users. People who want to vendor TS and get the smallest possible final output should pick up our mid-build artifacts and tree shake them.
Yeah, this is something I want to performance test; my impression is that ESM imports should be as fast as the whole-program bundling. I think that the differences were really down to variance + load time.
People who want to vendor TS and get the smallest possible final output should pick up our mid-build artifacts and tree shake them.
It's worth noting that vendoring has some big tradeoffs which might leave a user worse off. If someone still installs TypeScript (due to another dependency, for custom build tasks, or for having their editor use a workspace version), that person gets even more duplication of TypeScript, possibly with mismatched versions.
This is closed, but since people do still follow this issue, #55273 is on the docket for an early 5.3 merge; this PR effectively replaces typescript.js
with tsserverlibrary.js
and removes the latter. This leaves typescript.js
as the sole provider of the public API, saving roughly 8MB unpacked. Copy/pasting the package size report that is run on PRs:
Before | After | Diff | Diff (percent) | |
---|---|---|---|---|
Packed | 6.90 MiB | 5.48 MiB | -1.42 MiB | -20.61% |
Unpacked | 38.74 MiB | 30.41 MiB | -8.33 MiB | -21.50% |
Before | After | Diff | Diff (percent) | |
---|---|---|---|---|
lib/tsserverlibrary.d.ts |
570.95 KiB | 865.00 B | -570.10 KiB | -99.85% |
lib/tsserverlibrary.js |
8.57 MiB | 1012.00 B | -8.57 MiB | -99.99% |
lib/typescript.d.ts |
396.27 KiB | 570.95 KiB | +174.68 KiB | +44.08% |
lib/typescript.js |
7.95 MiB | 8.57 MiB | +637.53 KiB | +7.84% |
As for our executables (and potentially an ESM API); that'll be handled by #51440 when I get to dealing with the long set of changes that are required to make that happen.
Hi! First of all, thanks @jakebailey and the rest of the typescript team for constantly working on this matter to reduce the typescript install size 💙
With the awareness of all these efforts, I made an experimental project tslite
.
tslite
is a redistribution of TypeScript without API changes and with optimizations like code minification that probably won't be possible for the typescript
package itself but (significant) smaller size benefits a segment of users that directly install/need typescript as a peer dependency in their projects.
I hope this project will be helpful rather than something conflicting with the future roadmap of install size optimizations from the core package.
There is still more size work that can be done, specifically #51440.
However, I will note that the problem of package sizes is really not as bad as people think these days; every modern package manager uses hardlinks to a global cache, meaning that every install of TypeScript on a system will share the same backing files on disk. The "apparent" size may seem duplicative, but it's really all shared.
That and the install size seen on packagephobia is the unpacked size; the actual bits transferred from the registry are much, much smaller. Even gzip brings the tarball to about 6MB. tslite is smaller on that front at about 3MB, but overall most people only download each version of TypeScript once.
That combined with the hardlinking really means that we're talking about a few MB per system, paid once. One spends more network and disk space loading up Twitter or even GitHub via images and scripts that change often than the TS package.
I'm still going to try and make it smaller because I find it fun to do so, but it's a little moot IMO.
This matters when opening a repo on an online IDE where there is no cache. My home connection is ~2MB/s, so even in tarball TS still adds few seconds when I open a Stackblitz repro for Vite.
every modern package manager uses hardlinks to a global cache
Neither npm nor yarn use a global cache. (Unless Yarn is PnP mode, which brings a number of issues.)
overall most people only download each version of TypeScript once
There are over 2,800 versions of TypeScript. The chance that two different projects happen to install the same exact version is very low.
Even for a single npm install which dedups as much as possible, right now I'm looking at a project with 5 TypeScript versions. (Why? jsii, postcss-loader, prettier-plugin-organize-imports, puppeteer-core, cosmiconfig-typescript-loader, plus the version for the project itself.)
This matters when opening a repo on an online IDE where there is no cache. My home connection is ~2MB/s, so even in tarball TS still adds few seconds when I open a Stackblitz repro for Vite.
That's certainly true. It's a shame that these systems do not cache their artifacts.
Neither npm nor yarn use a global cache. (Unless Yarn is PnP mode, which brings a number of issues.)
Yarn 3 supports hard linking (https://yarnpkg.com/configuration/yarnrc#nmMode). If you're still using Yarn v1, you're not going to get any new features at all.
I was wrong about npm; it has a global cache but it copies the files.
There are 2,800+ versions of TypeScript. The chance that two different projects happen to install the same exact version is very low.
Even for a single npm install which dedups as much as possible, right now I'm looking at a project with 5 versions. (Why? jsii, postcss-loader, prettier-plugin-organize-imports, puppeteer-core, cosmiconfig-typescript-loader, plus the version for the project itself.)
There should really only be one TS version in a project; if this is happening, then some package is over-restricting what version of TS it needs. All modern package managers allow you to override versions within a workspace, and I would think it'd be safe to do that if space is a concern and your package manager can't hardlink.
It's also misleading to say that there are 2,800 versions of TypeScript; there are only a handful of stable releases. The rest are nightly builds.
People shouldn’t have to override Typescript versions. The project I’m working on now has 70 dependencies and if they all required post-install customization npm would be pretty unusable.
People shouldn’t have to override Typescript versions. The project I’m working on now has 70 dependencies and if they all required post-install customization npm would be pretty unusable.
I'm referring specifically to doing this in npm
:
"overrides": {
"typescript@*": "$typescript"
},
Or in yarn
:
"resolutions": {
"typescript@*": "$typescript"
},
Or in pnpm
:
"pnpm": {
"overrides": {
"typescript@*": "$typescript"
},
}
I am not referring to any sort of post-install patching, but just asking the package manager to resolve to a single version.
The point is that an override only seems reasonable because other dependencies don’t require any extra setup. NPM repos are supposed to be low-effort installs and typescript should be no exception.
It's a shame that these systems do not cache their artifacts.
There should really only be one TS version in a project
npm; it has a global cache but it copies the files.
Yes, as you say, IDEs, package maintainers, and package managers should be aggressively deduplicating redundancies.
....
....
....
....
And TypeScript should be doing the same. (Right now it's something crazy like ~75% duplicate code.)
Search Terms
size, bloat, install
Suggestion
The typescript package is large, and it only getting larger.
Version 3.1.3 is a whopping 40MB.
Use Cases
TypeScript is used in many contexts.
A TypeScript formatter (e.g. prettier) does not need an entire compiler. It only needs a parser. And 45MB scripted parser is orders of magnitude larger than one would normally expect. (For reference, the installed npm package for Esprima -- the most compatible and compliant ES parser in the ecosystem -- is a mere 0.3MB.)
Examples
Solution 1: Split up packages
Optionally, there could be separate packages for typescript-config and typescript-i18n.
Solution 2: Don't duplicate code
There is a lot of code duplication between
Don't duplicate the code.
Checklist
My suggestion meets these guidelines: