Experiment with Parallelized Parsing

DanielRosenwasser commented 1 year ago

Before TypeScript can respond to any semantic query (accurately), the entire program has to be loaded into memory, parsed, and bound. Parsing today ends up taking a surprising amount of time, and even if we don't have to do any path resolution. One of the things we could exploit is that parsing any two files should (ideally) be independent operations - and so one could imagine spinning up multiple processes or workers to divide and conquer the problem.

Unfortunately, program construction is not an embarrassingly parallelizable step because we don't always know the full list of files in a program ahead of time; however, one could imagine some work-stealing scheme with single orchestrator.

A glaring problem for TypeScript is that it is currently synchronous from end-to-end, and most worker capabilities are built with an expectation of asynchronous communication (and often depend on the surrounding environment - Node.js main thread, UI thread in the browser, non-UI worker threads, etc.). Another problem is that while we might be able to divide and conquer, the overhead of moving data between workers might be more than we'd have anticipated. And one last concern I'll mention is that while running multiple workers gets more work done faster, it's not a free lunch - it has UX tradeoffs because multiple threads can degrade the responsiveness of the overall machine.

So in the coming months, we'll be investigating here, finding out what works, and seeing if we can bring it into TypeScript itself.

ajafff commented 1 year ago

You would need an intermediate representation for the parsed AST, since the Nodes as they currently exist with methods and stuff, are not transferable. So that would basically implement #26871 which also enables caching the parse result.

ElianCordoba commented 1 year ago

In the Node.js enviroment there is a hack to do async computing with a sync API, I first read it here. It's a combination of worker_thread, SharedArrayBuffer and Atomics.wait

jakebailey commented 1 year ago

In the Node.js enviroment there is a hack to do async computing with a sync API, I first read it here. It's a combination of worker_thread, SharedArrayBuffer and Atomics.wait

Right; this is the same thing that's done for a synchronous filesystem in TypeScript's vscode.dev integration. It is one tool in my hypothetical toolbox when I look into this.

jpike88 commented 1 year ago

It's unfortunate that TypeScript can't leverage runtimes that are built for minimal overhead between threads/workers. I'll just leave that comment at that :)

jakebailey commented 1 year ago

Er, why leave it at that? What runtimes?

cayter commented 1 year ago

Just wondering if these tools are to solve the same performance problem:

We're currently using drizzle ORM which relies on a lot of union and infer, the VSCode typescript intellisense is super bad where it crashes from time-to-time which someone already reported here. Prisma ORM faces the same issue as we can't even complete tsc -b with the types it generates, it just crashes.

A genuine question: I see that the TS team introducing a handful of new features over the recent versions, has performance impact been part of the consideration when releasing a new feature? Reason I'm asking is because when a new feature comes out, the OSS projects tend to start using it and if the feature doesn't perform well, we have a huge ecosystem slowing down our projects as we upgrade towards newer version which could drive the community away from TS if this DX continues to degrade.

Thanks.

RyanCavanaugh commented 1 year ago

Performance impact is absolutely a thing used to determine feasibility of features, and indeed many features have been rejected due to performance cost.

nicoabie commented 1 year ago

Referencing previous talk https://github.com/microsoft/TypeScript/issues/30235

microsoft / TypeScript

Experiment with Parallelized Parsing #54256