microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
101.2k stars 12.51k forks source link

Reduce typescript package size #27891

Closed pauldraper closed 2 years ago

pauldraper commented 6 years ago

Search Terms

size, bloat, install

Suggestion

The typescript package is large, and it only getting larger.

screenshot from 2018-10-13 11-12-06

Version 3.1.3 is a whopping 40MB.

Use Cases

TypeScript is used in many contexts.

A TypeScript formatter (e.g. prettier) does not need an entire compiler. It only needs a parser. And 45MB scripted parser is orders of magnitude larger than one would normally expect. (For reference, the installed npm package for Esprima -- the most compatible and compliant ES parser in the ecosystem -- is a mere 0.3MB.)

Examples

Solution 1: Split up packages

Optionally, there could be separate packages for typescript-config and typescript-i18n.

Solution 2: Don't duplicate code

There is a lot of code duplication between

Don't duplicate the code.

Checklist

My suggestion meets these guidelines:

mattmccutchen commented 6 years ago

Some backstory in #23339.

pauldraper commented 6 years ago

Interesting reading, thanks.

TypeScript [2.9.0] has doubled in size since v2.0.0 - now 35 MB

It was "fixed" by #25901, released in 3.1.1, which was 40MB. :slightly_frowning_face:


It won't be hard at all to shrink the package size. For example, lib/tsserver.js and lib/tsserverlibrary.js are 98% identical.

$ du -b node_modules/typescript/lib/tsserver.js node_modules/typescript/lib/tsserverlibrary.js
7290127 node_modules/typescript/lib/tsserver.js
7308140 node_modules/typescript/lib/tsserverlibrary.js
$ comm -12 <(sort node_modules/typescript/lib/tsserver.js) <(sort node_modules/typescript/lib/tsserverlibrary.js) | wc -c
7207205

And 99% of lib/typescript.js is identical to those.

$ du -b node_modules/typescript/lib/typescript.js
6859801 node_modules/typescript/lib/typescript.js
$ comm -12 <(sort node_modules/typescript/lib/tsserver.js) <(sort node_modules/typescript/lib/typescript.js) | wc -c
6850490

And lib/typescriptServices.js is byte-for-byte identical to that.

$ sha1sum node_modules/typescript/lib/typescript.js node_modules/typescript/lib/typescriptServices.js
0cff9734eba3d721a7ba3c72026e16f267610e24  node_modules/typescript/lib/typescript.js
0cff9734eba3d721a7ba3c72026e16f267610e24  node_modules/typescript/lib/typescriptServices.js

And 99% of lib/typingsInstaller.js is identical to that.

$ du -b node_modules/typescript/lib/typingsInstaller.js
5285788 node_modules/typescript/lib/typingsInstaller.js
$ comm -12 <(sort node_modules/typescript/lib/typingsInstaller.js) <(sort node_modules/typescript/lib/typescriptServices.js) | wc -c
5246999

And 80% of lib/tsc.js is identical to that

$ du -b node_modules/typescript/lib/tsc.js
3912404 node_modules/typescript/lib/tsc.js
$ comm -12 <(sort node_modules/typescript/lib/typingsInstaller.js) <(sort node_modules/typescript/lib/tsc.js) | wc -c
3219205

That's nearly 30MB of duplication just in those few files (and this doesn't even include declaration files).

I can't begin to guess at the kinds of design decisions that produce this (or what kind of compatibilities the TS team needs to support), but I trust there is a solution the maintainers would be happy with.

MartinJohns commented 6 years ago

I can't begin to guess at the kinds of design decisions that produce this (or what kind of compatibilities the TS team needs to support), but I trust there is a solution the maintainers would be happy with.

It's done this way so that every file can be used by itself without having to deal with the nastiness of modules in JavaScript. Every file is a functional library/program in itself. I think that is a great thing, at the cost of some disk space.

pauldraper commented 6 years ago

Reading the linked issue #23339, it appears that it desire is in fact to (eventually) use modules.

https://github.com/Microsoft/TypeScript/issues/23339#issuecomment-380632662

If we used modules, we'd be able to share each file and avoid this duplication

it is something we want to do, but no plans for the short term. that is where the majority of savings would come from.


nastiness of modules in JavaScript

ES module systems in general can be hit-and-miss, but reminder that we're talking specifically about an npm package.

npm, npm packages, node_modules, package.json, etc. are relate to Node.js (or clones) which supports CommonJS. Right?

Kingwl commented 5 years ago

I have two ideas, but I am not sure which one is better.

  1. split code in the source code for now, some common utils or helper has been shared with a different component, we could split them by function, eg: utils.ts -> utils.ts( common), utils.factory.ts(depend on factory), utils.emitter.ts(depend on emitter), etc. if you want a factory or emitter only. just create a tsconfig.json file that include the depended file,

  2. analyze and transform the bundled file the namespace has been compiled to many iife and injected the namespace instance, we could compile with target esnext and merge those iife, then transform the ts.xxx = xxx to export xxx, and then, we could pack them as a normal esm project and tree shark

Kingwl commented 5 years ago

ping @DanielRosenwasser What do you think about that🧐?

DanielRosenwasser commented 5 years ago

I am skeptical that tree-shaking is useful for shipping our own package because presumably everything we ship is used in some capacity, or is part of our public API - at which point, our consumers would actually be the ones winning from tree-shaking.

Splitting source on its own can help, but practically speaking the larger components like services and TSServer will need the entire core compiler.

I think that converting to modules is the most practical and obvious way to avoid duplicating most of the contents of tsc.js 3+ times.

mihailik commented 5 years ago

A simpler solution: inspired from Busybox.

Combine N near-duplicate files into 1 polymorphic file that can do N things based on a parameter passed in.

It would introduce a performance overhead of parsing tiny % of unnecessary JS code, but can make the tool integration story way simpler. Maybe worth it?

One trivial way to know which feature is expected would be to directly copy Busybox approach: symlink all the duplicate files and differentiate at runtime based on the __filename. Saves disk space, package size, bandwidth. There are more interesting options too.

DanielRosenwasser commented 5 years ago

From speaking with @RyanCavanaugh, it sounded like @orta was interested in working on this.

dsherret commented 5 years ago

+1 for splitting up typescript into multiple packages. One major benefit would be that these individual packages (other than the "typescript" package) could use semantic versioning on at least their APIs then other libraries could just depend on the packages they need. Right now it's kind of a pain to maintain a library that has a peer dependency on the typescript package (without being super strict about the supported version).

orta commented 5 years ago

Yeah, I'm chatting with folks internally this week, but my goal is roughly:

Then have subset packages which are smaller and focused on a specific task:

I doubt I can offer any useful semver on them, as they link to the main TS version. That'd need the API to actually be classed as "stable" which doesn't look like that's happening soon.

Figuring out how/if we can reduce the main "typescript" is hopefully something I can get an idea about during ^

nykula commented 5 years ago

Removing tools from the package doesn't reduce overall size. Compilation, dev tools etc reuse a lot of the same code that is now copied to multiple commands without changes. The issue is how to share the very duplicated part between the tools, reduce the duplication, or pack the tools into one bundle.

weswigham commented 5 years ago

Yeah, I'm chatting with folks internally this week, but my goal is roughly

Oh, we're generally for it (and have been for years, provided we still provide a services bundle for our (browser) consumers who use it) - we just need an automated way to remap the current namespace-based code layout into modules, this way we can keep a PR doing the migration up to date and not stop development on other things. I have a branch from two years ago that migrated all of src/compiler to modules (by hand) - checker.ts had something like 100 lines of imports on it. And that took quite awhile to make. That gave some of us some pause and reduced enthusiasm, but... I'm hoping the final result is still seen as worth it.

With respect to said automation, I think we could probably write a kind of codemod for it using the APIs we have today, but nobody's put in the effort yet.

mjbvz commented 5 years ago

@orta VS Code is very interested in this work. Right now we consume TypeScript in two ways:

Each of those files is around 8MB on disk. Additionally, are interested in shipping built-in support for tsc (tsc.js), but that's another 4.5MB and that's difficult for me to justify. It seems to me like all these various TypeScript components should be able to share a lot of code.

Let me know if you would like any additional info about how VS Code consumes TS


As a side note, typingsInstaller.js is pretty huge too (6MB)!! Does it pull in a lot of stuff from TS core?

orta commented 5 years ago

I brought this up during the most recent design meeting - https://github.com/microsoft/TypeScript/issues/34899

Where the end result was basically, we're meeting about trying to get modules happening again

As mentioned above - all of these files are basically the same but with a bit of flavor difference because they represent different sets of the compiler + services - for example I think you can probably use tsserverlibrary for both the html + JS/TS cases in vscode, buttsc.js doesn't look like it lives in there.

orta commented 4 years ago

https://github.com/microsoft/TypeScript/pull/35561 is looking like the answer to this, I'll keep my eye on PR to see how things change

connor4312 commented 4 years ago

I am skeptical that tree-shaking is useful for shipping our own package because presumably everything we ship is used in some capacity, or is part of our public API

If it's not much effort to add into the build, this could still be a worthy goal. There are a few consumers, like Prettier and the new VS Code JS debugger extension, who ship TypeScript in a bundled form. It would double the size on disk if you shipped both ESM and CommonJS in a single package--maybe it could be a separate set of /typescript.*-esm/ packages?

DanielRosenwasser commented 4 years ago

@connor4312 You'll be able to give it a shot when it's migrated, I'm just saying to temper expectations about the savings you'll see.

nykula commented 4 years ago

Has anybody tried implementing an executable typescript multiplexer following the native pattern of crunchgen or toybox, per @mihailik's suggestion? It would generate the most savings I think.

mhart commented 4 years ago

npm install typescript@4.0.2 results in a 60MB node_modules on my Mac (56MB of which is typescript itself). Typescript is by far the largest module in our stack (and we have 146 explicit deps in package.json) – would love to see some reduction here 🙏

tvvignesh commented 3 years ago

Yup. This is the second largest module in my stack. typescript@4.0.3 is taking up 52M on disk - while its fine for prod since people typically dont ship typescript as well in images but the transpiled js files, still a reduction in size can impact the dev env significantly.

vostrnad commented 2 years ago

The install size of typescript@4.5.4 is 61 MB: install size

However most of that (51.8 MB) is just these six JavaScript files. Minifying them using uglify-js with just basic configuration reduces their size drastically (to 16.5 MB): File Size Minified size
tsc.js 5621 kB 2206 kB
tsserver.js 10378 kB 3237 kB
tsserverlibrary.js 10331 kB 3220 kB
typescript.js 9728 kB 2989 kB
typescriptServices.js 9728 kB 2989 kB
typingsInstaller.js 7298 kB 2273 kB

The resulting package size (25.7 MB) is less than half of the current install size at the cost of one additional build step. Is this maybe something that should be explored? I didn't manage to find any thread discussing this except for one mention in #23339.

MartinJohns commented 2 years ago

@vostrnad They're working on modularizing the compiler. #35561

nicolas377 commented 2 years ago

From an outsiders perspective, it seems there hasn't been much work done on this recently. People from a lot of corners of the typescript universe have chipped in their approval towards a smaller typescript package. I'm no expert on anything low-level, but I'm just chipping in to start the discussion. Could the community be of any resource to this?

jakebailey commented 2 years ago

I have (and another dev or two before me has) been working on #35210 (turning the TS package into modules, mentioned in this thread before), which would directly impact this by only having one copy of everything in the package (like most npm packages). Then, the package would be smaller, and the lack of namespace generation into single files would allow importers to properly tree shake (allowing consumers to ship less).

Forgive the lack of obvious progress; this work is done out of tree in a code transformer that will do the conversion from namespaces/outFile to modules in bulk, since this sort of thing is far too difficult of a task to do solely by hand (and probably not gradually either).

spacecowgoesmoo commented 2 years ago

This is what my work environment github folder looks like; the repeated yellow chunks are all typescript in various node_modules folders.

lots of typescript

kidonng commented 2 years ago

👋 Inspired by discussions here (especially @vostrnad's observation), I created a smaller redistribution of TypeScript: https://github.com/kidonng/typescript

It's not battle tested though, but I've successfully used it to build the Vite repo.

fisker commented 2 years ago

FYI: We(Prettier) just reduced bundled package size from ~3.5m to ~1.4m by manually remove unused code. https://github.com/prettier/prettier/pull/13431

jakebailey commented 2 years ago

For those following this thread, I've just posted the PR that converts the codebase to be implemented with modules (#51387); with this change comes major changes to our build and packaging, including a 43% reduction in package size.

jakebailey commented 2 years ago

I am filing followup issues now that the modules PR has been merged.

One such issue of interest here is #51440; the TL;DR is that if we raise our minimum supported Node version to Node 12, we could safely ship our executables as ESM, which would save us roughly 7 MB more on top of the 43% reduction above.

pauldraper commented 2 years ago

The reduction from modules is very significant. (Thanks!!!!!!)

If your math is correct, that reduces the package size from 65MB to 36MB.

Which is still larger than it was when #23339 was filed, asking for it to be smaller.

But alas, such is progress.

This was the largest possible improvement to the size. More could be done, but it's not gonna cut in half again.

jakebailey commented 2 years ago

Eventually, we may be able to ship as ESM and achieve the smallest possible package. Or, go further and publish individual packages for parts of our repo. That goal's a long way off, but there is work left o be done here.

styfle commented 2 years ago

Confirmed, TS nightly is much smaller now, thanks!

vostrnad commented 2 years ago
Following the migration to modules in typescript@5.0.0-dev.20221108, I ran my minification tests again. Using uglify-js on the five largest JavaScript files now reduces the package size from 35.6 MB to 18.0 MB: File Size Minified size
tsc.js 5097 kB 2281 kB
tsserver.js 7923 kB 2999 kB
tsserverlibrary.js 7886 kB 2983 kB
typescript.js 7338 kB 2705 kB
typingsInstaller.js 1756 kB 985 kB
jakebailey commented 2 years ago

I mentioned minification in the module conversion PR; we are restricted on that front because so many people still patch our package. If we minify, patching becomes difficult to impossible.

I'd love to be able to do so, but we have to figure out what to do about that first.

(We'd also probably not go "full" minify; we need to keep names for backtraces.)

pauldraper commented 2 years ago

Minify only saves space if you don't include source maps.

And excluding source maps seems like deal-breaker.

jakebailey commented 2 years ago

We already exclude source maps in the package, but our output is left "pretty" so that stack traces are meaningful when provided by downstream users.

If we were enabling minification, we would likely only have it remove whitespace and optimize syntax, leaving names in the output.

RyanCavanaugh commented 2 years ago

Re: ES Modules, I think we have to take performance as a serious goal. We get a big speed boost from esbuild's whole-program-aware bundling and giving that up for a better sticker number isn't a good trade-off for most users. People who want to vendor TS and get the smallest possible final output should pick up our mid-build artifacts and tree shake them.

jakebailey commented 2 years ago

Yeah, this is something I want to performance test; my impression is that ESM imports should be as fast as the whole-program bundling. I think that the differences were really down to variance + load time.

DanielRosenwasser commented 2 years ago

People who want to vendor TS and get the smallest possible final output should pick up our mid-build artifacts and tree shake them.

It's worth noting that vendoring has some big tradeoffs which might leave a user worse off. If someone still installs TypeScript (due to another dependency, for custom build tasks, or for having their editor use a workspace version), that person gets even more duplication of TypeScript, possibly with mismatched versions.

jakebailey commented 1 year ago

This is closed, but since people do still follow this issue, #55273 is on the docket for an early 5.3 merge; this PR effectively replaces typescript.js with tsserverlibrary.js and removes the latter. This leaves typescript.js as the sole provider of the public API, saving roughly 8MB unpacked. Copy/pasting the package size report that is run on PRs:

Before After Diff Diff (percent)
Packed 6.90 MiB 5.48 MiB -1.42 MiB -20.61%
Unpacked 38.74 MiB 30.41 MiB -8.33 MiB -21.50%
Before After Diff Diff (percent)
lib/tsserverlibrary.d.ts 570.95 KiB 865.00 B -570.10 KiB -99.85%
lib/tsserverlibrary.js 8.57 MiB 1012.00 B -8.57 MiB -99.99%
lib/typescript.d.ts 396.27 KiB 570.95 KiB +174.68 KiB +44.08%
lib/typescript.js 7.95 MiB 8.57 MiB +637.53 KiB +7.84%

As for our executables (and potentially an ESM API); that'll be handled by #51440 when I get to dealing with the long set of changes that are required to make that happen.

pi0 commented 1 year ago

Hi! First of all, thanks @jakebailey and the rest of the typescript team for constantly working on this matter to reduce the typescript install size 💙

With the awareness of all these efforts, I made an experimental project tslite.

tslite is a redistribution of TypeScript without API changes and with optimizations like code minification that probably won't be possible for the typescript package itself but (significant) smaller size benefits a segment of users that directly install/need typescript as a peer dependency in their projects.

I hope this project will be helpful rather than something conflicting with the future roadmap of install size optimizations from the core package.

jakebailey commented 1 year ago

There is still more size work that can be done, specifically #51440.

However, I will note that the problem of package sizes is really not as bad as people think these days; every modern package manager uses hardlinks to a global cache, meaning that every install of TypeScript on a system will share the same backing files on disk. The "apparent" size may seem duplicative, but it's really all shared.

That and the install size seen on packagephobia is the unpacked size; the actual bits transferred from the registry are much, much smaller. Even gzip brings the tarball to about 6MB. tslite is smaller on that front at about 3MB, but overall most people only download each version of TypeScript once.

That combined with the hardlinking really means that we're talking about a few MB per system, paid once. One spends more network and disk space loading up Twitter or even GitHub via images and scripts that change often than the TS package.

I'm still going to try and make it smaller because I find it fun to do so, but it's a little moot IMO.

ArnaudBarre commented 1 year ago

This matters when opening a repo on an online IDE where there is no cache. My home connection is ~2MB/s, so even in tarball TS still adds few seconds when I open a Stackblitz repro for Vite.

pauldraper commented 1 year ago

every modern package manager uses hardlinks to a global cache

Neither npm nor yarn use a global cache. (Unless Yarn is PnP mode, which brings a number of issues.)

overall most people only download each version of TypeScript once

There are over 2,800 versions of TypeScript. The chance that two different projects happen to install the same exact version is very low.

Even for a single npm install which dedups as much as possible, right now I'm looking at a project with 5 TypeScript versions. (Why? jsii, postcss-loader, prettier-plugin-organize-imports, puppeteer-core, cosmiconfig-typescript-loader, plus the version for the project itself.)

jakebailey commented 1 year ago

This matters when opening a repo on an online IDE where there is no cache. My home connection is ~2MB/s, so even in tarball TS still adds few seconds when I open a Stackblitz repro for Vite.

That's certainly true. It's a shame that these systems do not cache their artifacts.


Neither npm nor yarn use a global cache. (Unless Yarn is PnP mode, which brings a number of issues.)

Yarn 3 supports hard linking (https://yarnpkg.com/configuration/yarnrc#nmMode). If you're still using Yarn v1, you're not going to get any new features at all.

I was wrong about npm; it has a global cache but it copies the files.

There are 2,800+ versions of TypeScript. The chance that two different projects happen to install the same exact version is very low.

Even for a single npm install which dedups as much as possible, right now I'm looking at a project with 5 versions. (Why? jsii, postcss-loader, prettier-plugin-organize-imports, puppeteer-core, cosmiconfig-typescript-loader, plus the version for the project itself.)

There should really only be one TS version in a project; if this is happening, then some package is over-restricting what version of TS it needs. All modern package managers allow you to override versions within a workspace, and I would think it'd be safe to do that if space is a concern and your package manager can't hardlink.

It's also misleading to say that there are 2,800 versions of TypeScript; there are only a handful of stable releases. The rest are nightly builds.

spacecowgoesmoo commented 1 year ago

People shouldn’t have to override Typescript versions. The project I’m working on now has 70 dependencies and if they all required post-install customization npm would be pretty unusable.

jakebailey commented 1 year ago

People shouldn’t have to override Typescript versions. The project I’m working on now has 70 dependencies and if they all required post-install customization npm would be pretty unusable.

I'm referring specifically to doing this in npm:

"overrides": {
    "typescript@*": "$typescript"
},

Or in yarn:

"resolutions": {
    "typescript@*": "$typescript"
},

Or in pnpm:

"pnpm": {
    "overrides": {
        "typescript@*": "$typescript"
    },
}

I am not referring to any sort of post-install patching, but just asking the package manager to resolve to a single version.

spacecowgoesmoo commented 1 year ago

The point is that an override only seems reasonable because other dependencies don’t require any extra setup. NPM repos are supposed to be low-effort installs and typescript should be no exception.

pauldraper commented 1 year ago

It's a shame that these systems do not cache their artifacts.

There should really only be one TS version in a project

npm; it has a global cache but it copies the files.

Yes, as you say, IDEs, package maintainers, and package managers should be aggressively deduplicating redundancies.

....

....

....

....

And TypeScript should be doing the same. (Right now it's something crazy like ~75% duplicate code.)