microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
162.43k stars 28.62k forks source link

VSCode uses 10GB of RAM through git #223059

Open juliannadeau-stripe opened 1 month ago

juliannadeau-stripe commented 1 month ago

Type: Bug

I work on an extremely large monorepo repository that has about 1M refs, hundreds of thousands of files + directories, 1M+ commits, and takes about 20-40GB on disk depending on the size of compression used.

Our users were reporting that git was consuming up to 10-11GB of RAM multiple times per system. I was able to track this down to the invocation of git log ... -- <file> which is invoked by VSCode every time you open a file. This is being invoked by the Timeline view in this function https://github.com/microsoft/vscode/blob/cba3b82/extensions/git/src/git.ts#L1187

It turns out that git log -- is extremely performance sensitive on larger repos as it has to read in a ton of pack files into memory.

We should find an alternative command to get this information, or disable the timeline automatically. We've had to disable the timeline sitewide for our company.

VS Code version: Code 1.90.2 (Universal) (5437499feb04f7a586f677b155b039bc2b3669eb, 2024-06-18T22:37:41.291Z) OS version: Darwin arm64 23.5.0 Modes:

System Info |Item|Value| |---|---| |CPUs|Apple M1 Max (10 x 2400)| |GPU Status|2d_canvas: enabled
canvas_oop_rasterization: enabled_on
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
skia_graphite: disabled_off
video_decode: enabled
video_encode: enabled
webgl: enabled
webgl2: enabled
webgpu: enabled| |Load (avg)|9, 11, 9| |Memory (System)|32.00GB (0.46GB free)| |Process Argv|../pay-server| |Screen Reader|no| |VM|0%|
Extensions (53) Extension|Author (truncated)|Version ---|---|--- vscode-sql-formatter|adp|1.4.4 vscode-bazel|Baz|0.10.0 vale-vscode|chr|0.20.0 vscode-eslint|dba|3.0.10 EditorConfig|Edi|0.16.4 RunOnSave|eme|0.2.0 prettier-vscode|esb|10.4.0 flow-for-vscode|flo|2.2.1 copilot|Git|1.214.0 copilot-chat|Git|0.16.1 vscode-pull-request-github|Git|0.90.0 go|gol|0.42.0 vscode-graphql|Gra|0.11.0 vscode-graphql-syntax|Gra|1.3.6 path-autocomplete|ion|1.25.0 vscode-rdbg|Koi|0.2.2 json|Mee|0.1.2 vscode-language-babel|mgm|0.0.40 debugpy|ms-|2024.8.0 python|ms-|2024.10.0 vscode-pylance|ms-|2024.7.1 jupyter|ms-|2024.5.0 jupyter-keymap|ms-|1.1.2 jupyter-renderers|ms-|1.0.18 vscode-jupyter-cell-tags|ms-|0.1.9 vscode-jupyter-slideshow|ms-|0.1.6 remote-ssh|ms-|0.112.0 remote-ssh-edit|ms-|0.86.0 remote-explorer|ms-|0.4.3 vscode-yaml|red|1.15.0 vscode-paste-and-indent|Rub|0.0.8 vs-code-prettier-eslint|rve|6.0.0 vscode-ruby-syntax|Sar|0.0.11 scala|sca|0.5.7 vscode-fileutils|sle|3.10.3 sorbet-vscode-extension|sor|0.3.35 cody-ai|sou|1.26.6 config-validator|str|1.0.9 deployment-preview|str|1.1.0 devbox-auth|str|0.1.0 doculink|str|0.1.3 endsmart|str|0.1.1 markdoc-language-support|str|0.0.13 markdoctor-vscode|str|0.0.1 metrics|str|0.2.13 mkt-liquid|str|0.0.8 privacy-annotator|str|0.1.1 run-pay|str|0.6.18 stripe-skycfg|str|0.1.7 shellcheck|tim|0.37.1 change-case|wma|1.0.0 vscode-open-in-github|ziy|99991.4.3 vscode-proto3|zxh|0.5.5
VSCodeTriageBot commented 1 month ago

Thanks for creating this issue! It looks like you may be using an old version of VS Code, the latest stable release is 1.91.1. Please try upgrading to the latest version and checking whether this issue remains.

Happy Coding!

gjsjohnmurray commented 1 month ago

/assign @lszomoru

juliannadeau-stripe commented 1 month ago

Re: Triage bot. I was reading the code in the repo, the linked code is for the latest on main as of today - thus upgrading will not help :)

Voultapher commented 3 weeks ago

Thanks @juliannadeau-stripe for reporting the issue and drilling into it, I'm running into the same thing with large repositories.

Voultapher commented 3 weeks ago

@lszomoru I think the quickest fix would be to add an option to disable the timeline view. Right now I get high resource usage + an unresponsive editor for this:

image

The git log command should not be run unbounded, a depth limit for figuring out the timeline should be added in my opnion.

juliannadeau-stripe commented 2 weeks ago

For context, we have essentially disabled the timeline globally across the company 😄 (set pageSize to 0)

Since I see your screenshot on the source control view: There is another issue I need to file where git in the source control view can fail with similar symptoms if you cross a merge commit with many commits. We merged a repo with about 25000 commits into our main monorepo, any branch that passed that merge commit point (base before, currently after) would cause VSCode to use up to 40GB of RAM as it was doing a git log to the beginning of the merge commit (12 years ago) for us in this general code vicinty: https://github.com/microsoft/vscode/blob/55c07a3155472823b370ed0dec9f34886268111f/extensions/git/src/git.ts#L1147

Voultapher commented 2 weeks ago

FYI I've rolled back to 1.86 which I'll stay on until this will be resolved.

lszomoru commented 2 weeks ago

The history graph has been removed from the main "Source Control" view and moved into a separate "Source Control Graph" view. The new view implements caching as well as paging so it should be significantly less resource intensive. Also if the view is hidden, collapsed, not git commands should be executed against the repository.

The change is already available in VS Code Insiders and will be included in the upcoming VS Code Stable release.

juliannadeau-stripe commented 2 weeks ago

@lszomoru can you clarify what you mean by history graph? This issue is about the section called "Timeline". Are these the same? Can you share the commits in which the change was made? I'd like to see the new commands that would be run :)

lszomoru commented 2 weeks ago

In the latest version of VS Code (1.92) there are two distinct features that result in running in a git log command:

  1. Timeline view
  2. History graph that is being shown in the main "Source Control" view (screenshot from this comment)

The git log command related to the Timeline view is only being existed if the Timeview view is visible and expanded. If the view is hidden or is collapsed you should not be seeing any git log commands being executed while opening files. You can also use the following timeline.pageOnScroll to enabled paging, and timeline.pageSize to control the page size.

The history graph is being removed from the main "Source Control" view and moved into its own view - "Source Control Graph". This new view also support paging, and should only result in git log calls if it is visible and expanded.

juliannadeau-stripe commented 2 weeks ago

I see, @lszomoru Thanks for the explanation!

In that case, I am not sure that this solves the issue here. The timeline and history view are using commands that use 10+ GB of RAM regardless of how many pages are being used. A single invocation leads to runaway memory usage.

I can think of 2 things that could resolve this:

  1. Find a more efficient command to use
  2. Can VSCode provide settings I can disable globally for all of this history/timeline views to avoid our engineers' laptops crashing?
lszomoru commented 2 weeks ago

For context, we have essentially disabled the timeline globally across the company 😄 (set pageSize to 0)

Isn't this what you have done based on your comment? At least for the timeline view?

juliannadeau-stripe commented 2 weeks ago

It is "disabled" in that the page size is 0, but the actual feature is still on. It's still showing up in the UI, but it would be nice if it didnt run commands or show up at all :)

The source control UI could still use the off switch too :)

Thank you!

lszomoru commented 2 weeks ago

Rather than disabling various source control features, I would like to get them working with large repositories.

Timeline

Specifying the following timeline settings ("timeline.pageOnScroll": true, "timeline.pageSize": 50) will run the following git command (please replace file with a path of a file from the repository). Can you please confirm that running this from the terminal the git process consumes 10GB of RAM? I would also be interested in knowing the time it took the command to run.

git log --format=%H%n%aN%n%aE%n%at%n%ct%n%P%n%D%n%B -z -n51 --follow -- FILE

Source Control Graph

This feature, when the view is visible and expanded will run the following git command. Can you please confirm that running this from the terminal the git process consumes 10GB of RAM? I would also be interested in knowing the time it took the command to run.

git log --format=%H%n%aN%n%aE%n%at%n%ct%n%P%n%D%n%B -z --shortstat --diff-merges=first-parent -n50 --skip=0 --topo-order --decorate=full refs/heads/main refs/remotes/origin/main

Could you also make sure that you are running the latest version of git? Thanks!

juliannadeau-stripe commented 2 weeks ago

Timeline

I ran time git log --format=%H%n%aN%n%aE%n%at%n%ct%n%P%n%D%n%B -z -n51 --follow -- cibot/README.md and it takes about 3GB of RAM. It also takes about 16s to run. image

We have dug into this and git does not work well with mmaps on mac and loads far too much into memory. We saw abnormally high amounts of memory with a page size of 1 (200MB)

Source control graph

 time (git log --format=%H%n%aN%n%aE%n%at%n%ct%n%P%n%D%n%B -z --shortstat --diff-merges=first-parent -n50 --skip=0 --topo-order --decorate=full refs/heads/master refs/remotes/origin/master > /dev/null)
0.25s user 0.39s system 14% cpu 4.345 total

I dont see a ton of ram using this command

Voultapher commented 2 weeks ago

@lszomoru if you want to experiment yourself, I initially noticed the issue with this open source project https://github.com/rust-lang/rust which has a large history and many converging branches.

lszomoru commented 2 weeks ago

@juliannadeau-stripe, thanks for the update. Glad to hear that the command for the Source Control Graph view does not use a ton of memory. What is the version of git that you are using? If you are running the latest version, do you mind filing an issue for git so that it gets investigated? Does the command consume less resources if you omit the --follow flag?

juliannadeau-stripe commented 2 weeks ago

We are on the latest version :) We have spoken to some core git contributors about the issue, but haven't filed an official bug report about it.

lszomoru commented 2 weeks ago

I do think that it would make sense to file an issue again git to see if there is any improvements that they can make or whether there are any flags that we could use to reduce resource consumption. In the meantime you can either use the settings to enable paging in the timeline view and control the page size.