DanielRosenwasser commented 4 years ago

Background

We've heard from many TypeScript and JavaScript developers that large codebases suffer from massive startup delays in the editor. This is often due to program construction time. Before any semantic operation can be triggered, TypeScript has to effectively do the following:

Find out which project a file belongs to
Load that project
loop: Read each of the given files
Parse the file
Look through the imports/exports and resolve each import path
If new files were found, add them to the program's file list and go to loop.

All this is unfortunately synchronous today, and even if it wasn't, this might still end up being a lot of work for larger projects. For users with many files and non-trivial resolution schemes, we've witnessed program construction times that have taken minutes.

This can be frustrating for lots of users whose first tasks might just include navigation and some basic editing.

Basic Code Navigation

Let's take the following code and assume a user requests to go-to-definition on the import statement:

import * as foo from "./foo";

If TypeScript is still loading the program, it can't answer where ./foo resolves to.

But it's a relative path, and for most users this feels like something so obvious and cheap to compute. Most users could guess which file it points to be right. If there was a lightweight way to say "it should probably take you here" until a program is fully loaded up, it might get 99% of the job done for 99% of people. If there was a way to guess 2-3 locations and give the editor a list of locations to look up, that might just solve the issue.

Basic Completions

Let's take the following and assume the user requests completions in each of the comments:

import { SomeClass } from "some-module";

function foo() {
    function bar() {
        b/*inBar*/
    }
    function baz() {
        SomeClass./*afterDot*/
    }

    new /*afterNew*/
}

class CurrentClass {
    /*inClass*/
}

Completions at /*inBar*/ and /*afterNew*/ technically need a full program to tell us what globals are available, but a lot of the time users might be able to get by with locals until the program is fully loaded - after all, this was the basic experience in lots of editors before they had TypeScript support. We could still build up the list of local symbols and list those as the highest-priority completions, followed by a list of local identifiers.

When it comes to /*afterDot*/ something like what's proposed in https://github.com/Microsoft/TypeScript/issues/5334 might be good enough - just provide a list of local identifiers.

A completion at /*inClass*/ could limit itself to contextual keywords (e.g. public, private, protected, readonly, abstract, whatever) and that would still be useful.

Proposal

Some editors spin up two TS Server instances: one which looks for full semantic information, and one which only answers file-local syntactic questions. So one idea we've had is: could we give the syntactic server some basic heuristics to provide answers to editor questions that appear obvious, or where incomplete answers are good enough?

The proposal is to enable syntax-only instances of TypeScript's language service to answer a set of semantic-like questions that includes

Go-to-Definition on
- relative paths
- locally declared functions and variables
Completions based on
- Locally scoped identifiers
- Keywords based on different syntactic contexts

Go to Definition

On Imports to Relative Paths

When it comes to relative paths, there's at least the following resolution scheme:

If it has an extension
- remove it, try to resolve by appending .ts, .tsx, .d.ts, .js, .jsx
- try loading the file by that path (e.g. .json files)
Try appending .ts, .tsx, .d.ts, .js, .jsx to the current path

TypeScript can potentially provide an ordered list of lookup locations back to the editor, or just guess at one. The editor can try all of these, wait, and return the first "preferred" definition, but we'd need to sync with editor teams to see how feasible this would be.

Alternatively, the syntax server can perform the work, but there are some complications. See the open questions below.

On Imported Bindings

In

import { foo } from "./foo";

/**/foo();

It might be okay for go-to-definition to be the same as go-to-definition for ./foo itself if it's prohibitively expensive to load/parse/bind a file on the fly (though we're going to do that once it's opened in the editor anyway).

On Local Bindings

TypeScript can potentially bind each source file on demand, and walk up the AST to resolve the requested symbol, and jump to the definition. This means that go-to-definition definitely wouldn't work on globals across files, and it might be able to jump to the correct file of certain imports, but it would at least definitely work within the current file.

Completions

Completions could work by providing the list of contextual keywords based on the current AST (which we already have at least in part), along with a list of either file-local identifier completions or whatever's in the current chain of file-local symbol tables.

Editor Usage

Editors could use this new functionality by changing certain commands to be multi-step commands: see if the semantic language service is in the middle of loading a project first, use the syntactic version of the command, and then try to fire semantic a syntax command if that fails (e.g. for go-to-definition, none of the operations worked).

Alternatively, it can have both operations race against each other. Try both concurrently, see which completes first, and then either cancel the operation the semantic server if it loses or throw away the results of the syntax server if it loses.

Prior Art

Current Architecture

The syntax server existing in the first place gives some legitimacy to this approach - a server for answering questions that doesn't know the entire state of the world.

Roslyn

Roslyn has some of the same concepts but with a different approach. Roslyn makes a lot of their architecture asynchronous and lazy, and if results aren't currently available, they'll provide an incomplete experience - for example, completion lists aren't always available for the entire program. The approach is drastically different from what's being proposed here, but it's fundamentally the same idea: work with what you've got until you've got it all.

Open Questions

Can we return locations that don't exist in navigation?

We mention above the idea of guessing file locations - but those files might not exist on disk! Some editors might not expect an invalid location to be returned. We'll have to chat with editor teams to understand whether this is actually a problem or not. Without an opt-in flag, it might be.

Does go-to-definition need to return a ranked list of files to try?

Today, go-to-definition just returns a list of locations. In this version of go-to-definition, we might want to provide a list of potential lookup locations for the editor - but they're not all equally weighted. You don't want to provide the .js file in the same list of definitions as the .ts file. It seems like this version of go-to-definition needs to be slightly altered from the existing version.

Can the server just resolve the files itself instead returning a ranked list of files?

Technically it can but TSServer is synchronous. Maybe for a max of 5-10 file existence checks this wouldn't be too bad, but we should figure out if it's better for editors to decide.

Do we need new TSServer commands?

Maybe! Technically, we can have existing TS Server commands respond differently between the syntax server and the semantic server, but to be honest, I think we'd be better off not doing this. Some of this new functionality might need opt-in flags, so I think it would be better to make this entire experience opt-in via a separate language server command.

Is TypeScript providing its own list of completions actually better than most editors' naive implementation of completions?

It's not entirely clear whether TypeScript's "look at every identifier in the file" approach is actually going to be better than what an editor like VS Code does in a plaintext file. We should understand what they do for plaintext and see if there's actually anything we can provide that's better.

Should editors ever fall back to sytnax operations once the semantic server is done?

Operations like go-to-definition can often trigger new project loads. This will block the semantic server again. It feels like using the syntax server is the right call, but this is another place where "not everything works perfectly" might be confusing.

Would this experience too different/incomplete?

It might be very jarring if a user got drastically different results from a normal editing session. Hopefully visual indicators from the editor can give a hint that results will be slightly incomplete, but a big goal here is to capture as much of the current experience in a lightweight way.

DanielRosenwasser commented 4 years ago

@RyanCavanaugh @sheetalkamat @andrewbranch @minestarks @mjbvz @amcasey @gcrev93

amcasey commented 4 years ago

1) Why would the server return a location that doesn't exist rather than just doing a file existence check? 2) How does the editor know which version to request? If it races them, won't the syntax version always win? If it tries the semantic version first, how will it know when to give up because the server is blocked? 3) Would there be some sort of UI indication that a simpler version of the results were provided? Even if users don't care, we probably care in bug reports. 4) VS would need to do work to uncouple the syntactic server from the semantic server. It's probably worth doing, but it hasn't been planned out yet.

Edit: Offline we discussed a possible solution for (2) - watch projectLoadStart and projectLoadEnd events.

minestarks commented 4 years ago

Adding on to the file existence check concern - in a remote scenario (something similar to LiveShare, remote tools, VSO, LSP) it's conceivable for the language server/host to be on a different machine than the editor/client. Architecturally it makes more sense to me for the language server to own interaction with the file system so it's always guaranteed to happen on the host machine.

DanielRosenwasser commented 4 years ago

Why would the server return a location that doesn't exist rather than just doing a file existence check?

Adding on to the file existence check concern - in a remote scenario (something similar to LiveShare, remote tools, VSO, LSP) it's conceivable for the language server/host to be on a different machine than the editor/client. Architecturally it makes more sense to me for the language server to own interaction with the file system so it's always guaranteed to happen on the host machine.

I mention that above - it sounds like the preference from the VS side is to have the syntax server look this up. We should see if this is reasonably fast.

How does the editor know which version to request?

Today, VS Code seems to know when a project is in the middle of loading - it shows "Initializing JS/TS Langauge Features". So if there's information that they know while a project is loading, that can be leveraged. Otherwise, we might need to provide new events of some sort. Editor reps like @mjbvz and @minestarks will need to weigh in here.

Would there be some sort of UI indication that a simpler version of the results were provided?

That's currently the "Initializing JS/TS Langauge Features" thing that VS Code has - I don't know enough from the VS side to be honest, but I think we should have something similar.

VS would need to do work to uncouple the syntactic server from the semantic server.

I need more background here - can you elaborate?

ahejlsberg commented 4 years ago

I'm not sure I understand what "...and one which only answers file-local syntactic questions" really is? It seems almost any question other than syntax highlighting and grammatical errors requires some form of binding and semantic analysis, i.e. creation of a program and involvement of the type checker. I think a more viable approach is to just create a single file program for the current file where we squelch errors from unresolved references but otherwise just answer requests the same way as always. A number of things will become any, but some local stuff would still work.

DanielRosenwasser commented 4 years ago

@ahejlsberg That's certainly also possible and solves things like completions and local go-to-definition, but it doesn't give us go-to-definition on a path for free.

We could also just try to have go to definition work on paths for files that aren't necessarily part of the program. This might be helpful for fixing other annoying issues with the language service. For example, we have a problem where if you're importing from a file that's not a module (because you forgot to export something), you can't go-to-definition on an import of that file will fail:

// foo.ts

function foo() {
}

// bar.ts
import { foo } from "./foo.js"; // go to definition on this fails

amcasey commented 4 years ago

Adding on to the file existence check concern - in a remote scenario (something similar to LiveShare, remote tools, VSO, LSP) it's conceivable for the language server/host to be on a different machine than the editor/client. Architecturally it makes more sense to me for the language server to own interaction with the file system so it's always guaranteed to happen on the host machine.

I'm definitely missing something: the tsserver is already full of existence checks - why would this particular one be owned by the editor? Also, I definitely had the impression that, when the client and server are on different boxes, the files would be with the server - otherwise the client would have to send the contents of all closed files over the wire.

DanielRosenwasser commented 4 years ago

Again, the idea of handing back file names was to

ensure that the syntax server wasn't blocked for too long (a full round of file resolution)
allowing editors to asynchronously open files up if they have that capability

While it might be faster for the server to resolve it, it would block the process less to hand back a list of files.

However, it sounds like this might not be as much of a concern as I had in mind and you'd both prefer for the server to just take care of it.

amcasey commented 4 years ago

I can imagine the existence check being a perf issue on the syntactic server, but I'd like to stick with the clean implementation until that's shown to be the case.

mihailik commented 4 years ago

Is it viable for editing experience to 'upgrade' incrementally with loading, in phases?

Parse/load currently open file only.
Load other open tabs.
Load rest of the project.

Pre-loading phases (1 and 2) suppress semantic errors, but do full navigations, completions etc.

Note that construction of Program for phases 2 and 3 can be a pausable/restartable process. Files can be added in batches as next-phase Program is prepared, and incoming requests still handled using the immutable Program from the previous-phase.

And the navigations for 'import path' case can short-circuit by inserting the extra file into Program ahead of normal 1/2/3 loading schedule.

P.S. When I've built a TS playground-like tool years ago, I've implemented similar incremental-loading strategy for lib.d.ts parsing to avoid UI blocking: feeding it into LanguageService in portions. With whole files it's even less hassle.

DanielRosenwasser commented 4 years ago

Is it viable for editing experience to 'upgrade' incrementally with loading, in phases?

Technically yes, but because resolution happens at one big synchronous step, we'd have to inject "yield" points into the language service which I'd think makes it practically too difficult on one server.

P.S. When I've built a TS playground-like tool years ago, I've implemented similar incremental-loading strategy for lib.d.ts parsing to avoid UI blocking: feeding it into LanguageService in portions. With whole files it's even less hassle.

Tell me more - were you spinning up extra service instances to do this?

mihailik commented 4 years ago

resolution happens at one big synchronous step

Isn't it incremental? Such that when it's happened once, an incremental update would be fast?

Tell me more - were you spinning up extra service instances to do this?

No, the same server. I fed it a series of lies, gradually approaching the truth. That is, LSHost.getScriptFileNames() was only giving the service a subset of files.

I add a file to the list, and invoke getSemanticDiagnostics(...) on it. That makes Program parse, resolve and all the rest except emitting. After a tiny delay I add another, invoke again, and so on. That file-by-file loading prevented freezing.

This was a while ago, here is a relevant part of that code:

DanielRosenwasser commented 4 years ago

Isn't it incremental? Such that when it's happened once, an incremental update would be fast?

We can detect what changes occurred (imports/exports added/removed, failed lookups that might have changed), but that first step of program construction happens at once and doesn't paused/resumed.

No, the same server. I fed it a series of lies, gradually approaching the truth. That is, LSHost.getScriptFileNames() was only giving the service a subset of files.

That's pretty interesting. It would be cool to see if we could get basic stuff like completions from lib.d.ts working too, though that's an extremely large file.

mihailik commented 4 years ago

but that first step of program construction happens at once and doesn't paused/resumed

Two assumptions feel safe to me:

it's very fast to create an empty Program
it's very fast to recreate even huge Program when only 1 file added

If that 👆 is true, it provides for loading a large program incrementally, without long blocking delays.

It would be cool to see if we could get basic stuff like completions from lib.d.ts working too

UPDATE: see the link below with a small script showing incremental loading:

Files below 60Kb threshold are loaded one-by-one, pausable between the files.
Larger files are split into chunks, and loaded chunk-by-chunk, pausable between chunks.

The latter is done combining IScriptSnapshot.getChangeRange and LanguageServiceHost.getScriptVersion(). Effectively, as if a file is modified by adding more lines at the bottom.

It works great when a large file is constituted of multiple top-level scopes (applies well to lib.d.ts). When a file is a clumb of spaghetti, throughput decreases a lot, as every chunk invalidates more state and creates churn.

P.S. The link below has screen recording of incrementally loading TypeScript repo, including lib.d.ts at the start.

microsoft / TypeScript

Partial Semantic Editing Operations #37713

Background

Basic Code Navigation

Basic Completions

Proposal

Go to Definition

On Imports to Relative Paths

On Imported Bindings

On Local Bindings

Completions

Editor Usage

Prior Art

Current Architecture

Roslyn

Open Questions

Can we return locations that don't exist in navigation?

Does go-to-definition need to return a ranked list of files to try?

Can the server just resolve the files itself instead returning a ranked list of files?

Do we need new TSServer commands?

Is TypeScript providing its own list of completions actually better than most editors' naive implementation of completions?

Should editors ever fall back to sytnax operations once the semantic server is done?

Would this experience too different/incomplete?