microsoft / vscode-cpptools

Official repository for the Microsoft C/C++ extension for VS Code.
Other
5.53k stars 1.56k forks source link

Huge disk space is being used under `/home/<username>/.cache/vscode-cpptools` #12066

Closed sly1061101 closed 6 months ago

sly1061101 commented 8 months ago

As the screenshot shows, it's using ~270GB on my system!

image

The intelli sense cache size has been limited to 5GB, which is indeed the size for the ipch folder:

image

What are the other folders? Is there a way to also limit their sizes?

bobbrow commented 8 months ago

The hash-named folders contain the database we generate when we index the workspace folder. Judging by the size of some of these, I wonder if you are hitting an issue with a symlink pointing outside your workspace folder and resulting in the extension indexing way more than intended. We don't generally see databases that large. I don't think even Chromium's database gets that big, but it's been a while since I checked. Are you working on very large codebases?

sly1061101 commented 8 months ago

Thanks for the quick response. I do work on a fairly large code base, and I have three copies (git worktree) of the code base, each having a VSCode workspace. So it's kind of as expected?

bobbrow commented 8 months ago

It would be expected to have large databases for large codebases, but 90GB still seems like a lot to me. If you set the C_Cpp.loggingLevel setting to "Debug" and reload your window, you can search for a string like this in the logging output window to see how many files we processed for the database:

Discovering files: 31704 file(s) processed

^ That workspace has a 500MB database associated with it so if I extrapolate to 90GB it would indicate that there are upwards of 5M files being tracked in your largest database. If your codebase is actually that big, I would say the number looks in line with my expectations, but if it isn't, we might want to look and see if you do have the symlink problem.

TryerGit commented 8 months ago

Issues like these have given me/continue to give me sleepness nights. See related discussion https://github.com/microsoft/vscode-cpptools/discussions/10637

After probing into this from various sources, I have come up with a list of folder whose contents that, in my opinion, are safe to delete without breaking any functionality. I use VSCode + cpptools on pure Windows/WSL under windows/Pure Linux machine. I also use VSIDE on Windows. Below are the folders that are safe to delete. I have done so multiple times on different computers without any breakage in functionality. Like all things posted by anonymous folks like me on the internet, DO YOUR OWN TESTS TO CONFIRM, CAVEAT EMPTOR!

Appdata/local/Microsoft         /Windows/Caches/
                                /VisualStudioServices/8.0/Cache/
                                /VisualStudioServices/7.0/Cache/
                                /VisualStudio/17.0_x/ComponentModelCache/
                                /vscode-cpptools/ipch/
                                /VisualStudio/17.0_x/ItemTemplatesCache_*/
                                /VisualStudio/16.0_x/MEFCacheBackup/
                                /VisualStudio/16.0_x/PackageCache/
                                /VisualStudio/17.0_x/ProjectTemplatesCache_*/
                                /VisualStudio/17.0_x/TextMateCache/
                                /VisualStudio/16.0_x/ProjectAssemblies/? unsure
                                /WebsiteCache/
                                /VisualStudio/SettingsLogs/

Appdata/Roaming/Code            /Cache/
                                /CachedConfigurations/
                                /CachedData/
                                /CachedExtensions/
                                /CachedExtensions/VSIXs/
                                /Code Cache/
                                /GPUCache/
                                /Service Worker/CacheStorage/
                                /Service Worker/ScriptCache/
                                /User/workspaceStorage/

Appdata/Roaming/                /Intel 
Appdata/Roaming/                /Intel VTune Profiler 

Appdata/Local/Temp/

C:/Windows/Temp/*
c:/windows/prefetch/*
c:/windows/softwaredistribution/download/* https://www.windowscentral.com/how-clear-softwaredistribution-folder-windows-10 says to leave this alone

wsl/Ubuntu/home/TryerGit        /.cache/vscode-cpptools/ipch/
                                /.vscode-server/data/logs/
                                /.vscode-server/data/User/WorkspaceStorage/
                                /.cache/*/
                                .viminfo                                

WSL Root                        /tmp/

PureLinux: /home                /.cache/vscode-cpptools/ipch
                                /.cache/ 
                                /.config/Code/Cache
                                /.config/Code/CachedData/
                                /.config/Code/CachedExtensions/
                                /.config/Code/CachedExtensionsVSIXs/
                                /.config/Code/Code Cache/
                                /.config/Code/Crash Reports/
                                /.config/Code/exthost Crash Reports/
                                /.config/Code/GPUCache/
                                /.config/Code/logs/
                                /.config/Code/Service Worker/CacheStorage/
                                /.config/Code/Service Worker/ScriptCache/
                                /.config/Code/User/workspaceStorage/
                                .viminfo

PureLinux: Root                 /tmp/

BTW, OP: Which software are you using in the image of the OP? I use WinDirStat on Windows, but you seem to be on Linux?

sly1061101 commented 8 months ago

It would be expected to have large databases for large codebases, but 90GB still seems like a lot to me. If you set the C_Cpp.loggingLevel setting to "Debug" and reload your window, you can search for a string like this in the logging output window to see how many files we processed for the database:

Discovering files: 31704 file(s) processed

^ That workspace has a 500MB database associated with it so if I extrapolate to 90GB it would indicate that there are upwards of 5M files being tracked in your largest database. If your codebase is actually that big, I would say the number looks in line with my expectations, but if it isn't, we might want to look and see if you do have the symlink problem.

Thanks, here's the result:

Discovering files: 200602 file(s) processed

So it should not lead to 90GB database. I will check the symlink.

sly1061101 commented 8 months ago

BTW, OP: Which software are you using in the image of the OP? I use WinDirStat on Windows, but you seem to be on Linux?

@TryerGit It's on Ubuntu Linux, a system utility named "Disk Usage Analyzer"

bobbrow commented 8 months ago

@sly1061101 by any chance is your repo open source?

EDIT: and you don't need to check for symlink issues because you're not at an unexpected 5M files being indexed.

sly1061101 commented 8 months ago

by any chance is your repo open source?

Unfortunately not, it's my employer's code base

and you don't need to check for symlink issues because you're not at an unexpected 5M files being indexed.

Uh, ok, so the file numbers is not the issue here? Anyelse I could try?

bobbrow commented 8 months ago

One thing you can try is to regenerate the database for your workspace and compare the size. If you'd like to do that without deleting your existing one, you can either:

sly1061101 commented 8 months ago

Thanks, I renamed the exisiting one to vscode-cpptools-bak, and reopened the several workspaces. I can confirm each unique hash corresponds to one workspace I have:

image

One question - do you know why in the original folder there could be multiple folders for the same hash (just with -1, -2 postfix)?

I also confirmed the file .browser.VC.db is using up the significant disk space:

image

sly1061101 commented 8 months ago

Found another strange behavior, so I opened the workspace that corresponds to 3301dddae.... multiple times, and it seems that every time a new folder is being created:

image

I don't think I observed this before, otherwise the original folder should have way more than 3 folders for this hash... pretty sure I've opened this workspace a lot of times

EDIT: It stopped creating new duplicates after the 5th one (i.e. 3301dddae....-4), even if I reopen the workspace for more times

bobbrow commented 8 months ago

Sorry for disappearing on this issue. We've been dealing with some regressions related to the release. One of our engineers noticed a strange issue with his database that caused the extension to start indexing from the folder above the workspace folder. He opened this issue (#12105) and we hope that it's related to what you're seeing. We'll have a fix out in 1.19.9 that we'd like for you to try when it's available. @Colengms

sean-mcmanus commented 7 months ago

@sly1061101 Is this fixed with https://github.com/microsoft/vscode-cpptools/releases/tag/v1.19.9. Otherwise we might need more repro info.

sly1061101 commented 7 months ago

Thank you both for the updates. I will update to 1.19.9, remove the cache directory and monitor if it's resolved

sly1061101 commented 7 months ago

I'm still waiting longer to see if the issue will happen again, but up to now I've noticed that there could still be multiple folders to be created for the same workspace:

$ du -sh ~/.cache/vscode-cpptools/*
332M    /home/liuyis/.cache/vscode-cpptools/23ba42b1fb5498717cb6b48f49d3d7f2
264M    /home/liuyis/.cache/vscode-cpptools/23ba42b1fb5498717cb6b48f49d3d7f2-2
989M    /home/liuyis/.cache/vscode-cpptools/5c9bc4f05e65f0f64af9ec03c59bcd64-1
96M     /home/liuyis/.cache/vscode-cpptools/ipch

The two 23ba42b1fb5498717cb6b48f49d3d7f2 folders correspond to the same workspace that I opened twice

sean-mcmanus commented 7 months ago

@sly1061101 VS Code itself is providing us with those hash values as the last part of the extensionContext.storageUri, see https://github.com/microsoft/vscode-cpptools/blob/main/Extension/src/LanguageServer/client.ts#L1193 .

bobbrow commented 7 months ago

What @sean-mcmanus is suggesting is that the -# paths seem to be coming from VS Code and we don't know why they are creating multiple hashes for the same workspace. Is there something special about the way you open the workspace or do you open it the same way every time? Do you have multiple instances of VS Code open at the same time?

sly1061101 commented 7 months ago

I waited for one day after clearing the cache directory, and now they seem to have been recreated and still using considerable amout of space:

$ du -sh ~/.cache/vscode-cpptools/
59G     /home/liuyis/.cache/vscode-cpptools/

$ du -sh ~/.cache/vscode-cpptools/*
13G     /home/liuyis/.cache/vscode-cpptools/23ba42b1fb5498717cb6b48f49d3d7f2
1.4G    /home/liuyis/.cache/vscode-cpptools/23ba42b1fb5498717cb6b48f49d3d7f2-1
343M    /home/liuyis/.cache/vscode-cpptools/23ba42b1fb5498717cb6b48f49d3d7f2-2
5.9G    /home/liuyis/.cache/vscode-cpptools/27393237d09da9a54bc9f2744eead072
905M    /home/liuyis/.cache/vscode-cpptools/5c9bc4f05e65f0f64af9ec03c59bcd64
37G     /home/liuyis/.cache/vscode-cpptools/5c9bc4f05e65f0f64af9ec03c59bcd64-1
808M    /home/liuyis/.cache/vscode-cpptools/ipch

so it's probably a different issue?

Is there something special about the way you open the workspace or do you open it the same way every time? Do you have multiple instances of VS Code open at the same time?

It's the same way every time. Maybe the only thing that might be special is I am using remote SSH development? There's no multiple instance, but I did notice that sometimes the cpptools process does not exit immidiately after I close VSCode. However, the duplicated folders for the same workspace would be created even if I wait until cpptools process has fully stopped.

Colengms commented 7 months ago

Hi @sly1061101 . Could you provide the contents of your c_cpp_properties.json and any VS Code that settings that related to the C/C++ Extension (anything starting with C_Cpp) ? This is something we usually ask for up front. This issue could possibly be by-design, if you've specified / as one of your include paths or browse paths, for example, causing the extension to try to index your entire file system into the database.

I'd also suggest enabling "C_Cpp.loggingLevel": "Debug" and monitoring the output in the C/C++ Output channel. It will list what files as it parse them, and you may spot files that it shouldn't parse, which may lead to identifying the cause.

You can also open the database itself. It's a sqlite database and can be opened using a sqlite database viewer app. You might check the file table, to see if it contains information for files that are not related to your workspace.

github-actions[bot] commented 6 months ago

This issue has been closed because it needs more information and has not had recent activity.

johnothwolo commented 5 months ago

@Colengms I've noticed this same issue because I view large C/C++ projects like xnu and FreeBSD. The cumulative space taken up by ipch files is huge. On my Mac the ipch files range from 10-200MB.

I think compression should be used, and could save vscode users a lot of space (especially MacBook owners with 128/256GB storage). I compressed a 185MB ipch down to just 21MB using zstd with the default ratio. If CPU usage and battery life become a concern, maybe add it as an option? Maybe another option to adjust the compression ratio?