microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
160.51k stars 28.1k forks source link

Improve Performance Of Unicode Highlighting / Link Detection #143270

Open hediet opened 2 years ago

hediet commented 2 years ago

Problem

Currently, both the unicode highlighting and link detection feature scan the entire text model again whenever a change has been made (after some timeout).

This is done in a webworker and thus does not impact performance of the main UI. But it might affect overall system performance and energy consumption.

Relativation

Nonetheless, even for huge files (8MB, more than 230k lines) link detection takes about 70ms and unicode highlighting about 20ms on my machine in the webworker. For files with hundreds of thousands of links however, updating the decorations in the UI takes up to a second. Also, sending the decoration becomes more expensive than computing them.

Thus, I don't think this improvement is currently needed, but it might be interesting to explore this idea in the futer.

Proposed Solution

We could introduce an mechanism to incrementally compute and update decorations.

Such a solution could look like this:

interface IMatch<T> {
    range: Range;
    data: T;
}

interface WhitespaceIndependentTextMatcher<T> {
    isWhitespaceIndependent: true;
    /**
     * ```
     * ∀txt1 txt2: String, WS: (\r\n\t\u{20})*
     *  findMatches(txt1 + WS + txt2)
     *   = [
     *   ...findMatches(txt1),
     *   ...findMatches(txt2).map(d => ({ ...d, range: d.range.moveBy(txt1 + WS) }))
     *   ]
     * ```
     */
    findMatches(text: string): IMatch<T>[];
}

interface IncrementalTextMatcher<T> {
    getMatches(): IMatch<T>;
    handleEdit(text: string, oldRange: Range): void;
}

function createIncrementalMatcher<T>(matcher: WhitespaceIndependentTextMatcher<T>, textModel: ILineBasedTextModel): IncrementalTextMatcher<T> {
    // ...
}
alexdima commented 2 years ago

:+1: This is a good idea. I think we could break it down to two distinct problems:

  1. reducing the computation to be directly proportional to the changes (e.g. typing a character could reduce the computation range to the line, maybe for long lines to its position +/- 1000 chars around it on the same line)
  2. reducing the data sent from web worker to the main thread. Here I suggest to be able to model a lifecycle for a result, such that the web worker side can assume the main thread has kept the previous result around and then to be able to return a delta. We can get inspired from the semantic tokens API or come up with something new if we find something better.