editor lag in large projects

kortina commented 4 years ago

I have a large javascript project that is not a markdown notebook, and as of Markdown Notes v0.0.19 (2020-09-18) there is a massive delay before I can edit files in that workspace.

Debugging.

Here is the release info: https://github.com/kortina/vscode-markdown-notes/releases/tag/v0.0.19

kortina commented 4 years ago

Looks like the lag is due to NoteParser.hydrateCache on bootstrap:

https://github.com/kortina/vscode-markdown-notes/blob/master/src/NoteParser.ts#L293

Mod with timing:

  static async hydrateCache(): Promise<Array<Note>> {
    var s = [console.warn('---- MarkdownNotes.hydrateCache BEGIN'), new Date()][1];

    let useCache = false;
    let parsedFiles = await NoteParser.parsedFilesForWorkspace(useCache);

    console.warn(`---- MarkdownNotes.hydrateCache END: ${((new Date()).getTime() - s.getTime()) / 1000}s`);
    return parsedFiles;
  }

Takes ~38s:

MarkdownNotes.hydrateCache BEGIN
MarkdownNotes.hydrateCache END 37.933s elapsed

Lots of md files in the codebase in question:

❯ find . | grep "\.md$" | wc -l
    6545 # markdown files in codebase (most in node_modules)

kortina commented 4 years ago

I checked out the v0.0.18 code and ran the timer in the same codebase, saw the same behavior. I think the temp solution is to respect various ignore files.

But I also kind of want to dig into what is happening on the main thread and see if there is a way to offload to background.

kortina commented 4 years ago

Links to research:

kortina commented 4 years ago

The easy short term fix is to add some excludes patterns to prevent parsing of those files. longer term, however, for large note projects, we gotta think of how to handle this. is there a way to move things like file IO and parsing large documents to a background thread that anyone is aware of? @lukesmurray would you happen to know of anything?

lukesmurray commented 4 years ago

I agree that adding exclude patterns is the best solution in the short term. I ran into this issue super early on since I stuck with remark (which is slow compared to regexes) and honestly I never quite solved it. I kicked the can down the road with very aggressive caching but the issues eventually resurfaced (also I kept adding features which made processing time go up).

In the long term the best solution is to implement a language server. This type of issue is listed as one of the reasons for the language server design.

Additionally, language features can be resource intensive. For example, to correctly validate a file, Language Server needs to parse a large amount of files, build up Abstract Syntax Trees for them and perform static program analysis. Those operations could incur significant CPU and memory usage and we need to ensure that VS Code's performance remains unaffected.

The language server would work similarly to a web worker and offload file processing to a separate thread from the UI. Another benefit of a language server is it could be used on any editor which supports language server protocol. With enough interest it could be an amazing contribution to open source. However building a language server is harder than building a vscode extension, and a lot of the existing features/code would have to be rewritten in order to fully take advantage of the language server. I tried rewriting all my code as a language server but eventually decided it was too much effort for just my own notes 😅 .

There may be some ways to fix the user experience without implementing a language server. One thing that sticks out to me is the delay for editing files. Ideally you should be able to edit files even though the extension is working, I would potentially look at splitting cpu hungry tasks in order to give back control to the vscode ui. Secondly you could try to come up with some sort of strategy for progressive disclosure. Parse the documents which are active, then the documents which are open, then all the remaining documents. Progressive disclosure will make the code more complicated though.

I'm grabbing at straws here but something that could help is using vscode rather than fs readfile to get the document text (vscode.openTextDocument().then(doc => doc.getText())). Under the hood vscode may have some sort of caching going on for file descriptors, and the extension probably shouldn't open files if vscode can open them internally.

Anyway like I said I never quite solved this problem but maybe having multiple eyes on it will help.

lukesmurray commented 4 years ago

One thing you could do is use the workspace state to store the note caches. After parsing a document I store the result (links, title, etc...) in the workspace state so that I don't have to parse the document the next time I open the workspace. In order to bust the cache I hash the raw text of each document into a key. When I open the workspace initially I iterated over each document and compute the hash, if the hash is the same I load the parsed contents from the memento, if the hash is different I reparse the document. In your case it would be important to determine if parsing is the bottleneck or if simply opening and reading files is the bottleneck. In my case parsing was definitely the bottleneck so this was a big win.

kortina commented 4 years ago

@lukesmurray do you have a branch with the work you started on this in your own codebase:

I tried rewriting all my code as a language server but eventually decided it was too much effort for just my own notes 😅 .

What is the hard part? The API looks relatively straightforward -- is it that you lose out on the VSCode APIs and need to write pure js/ts? (Not saying it's not hard -- I believe you. Just wondering where the effort lies).

Also, I like your suggestion about caching the note hashes to avoid re-parsing, tho that will help more on subsequent boots, not nec initial boot.. gonna add a separate issue for that.

lukesmurray commented 4 years ago

Sure I'll make it public but its not enough to really go off of just yet. Here's a link

The main challenges I ran into were differences between the workspace and text document interfaces and some confusion there. For example we want to read all the files in the workspace, and language servers have fairly deep interfaces for reading files and synchronizing them, but it turns out if the file is not open in the client the language server is supposed to read the file using it's uri.

Another challenge is I tried to roll my own Text Document Synchronization because of the issues listed here. Doing that efficiently is not easy but you don't have to do it since the documentation is outdated and the built in text document synchronization is already incremental.

I don't think it is really that hard but spending a weekend reading outdated documentation and dealing with confusing interfaces wasn't all that fun! I think once that initial document synchronization is in place the language server could be implemented fairly rapidly.

lukesmurray commented 4 years ago

Oh another thing is I think I overshot in my initial expectations and wanted to write a language server which takes into account the client's reported capabilities since I had dreams of being able to write linked notes in any editor. I would recommend against that approach and instead target vscode and get something working.

kortina / vscode-markdown-notes

editor lag in large projects #90