microsoft / vscode-cpptools

Official repository for the Microsoft C/C++ extension for VS Code.
Other
5.54k stars 1.57k forks source link

Huge .browse.VC.db file #150

Closed alexdima closed 8 years ago

alexdima commented 8 years ago

Moved from Microsoft/vscode#10557


From @bharathitman

Steps to Reproduce:

  1. I was working on a simple angular 2 application for a few hours. When I was about to push the code I got an error saying there was this huge file .browse.VC.db (~720 MB in size). I had to add this to git ignore
  2. I think I do get the purpose of the file, but is it supposed to be this huge? or is the behavior strange?

From @AkashGutha

Had same problems but was 40Mb in size though ? What is this file about ?

jgoshi commented 8 years ago

image2 image1

This file is the database of symbols in your files. Depending on the size of the project the database can become large. The file sizes you listed look normal. You can exclude it from git as you did. You can also control the location of the database (so it is outside your repo). See the two attached screenshots for more details. If you edit the cpp settings file you can find a databaseFilename setting. Use the full path (directory and file name) you want to use. If it's an empty string (or missing from the settings file) then it'll go to the default location.

tojocky commented 7 years ago

mine is 20+ GB.

sean-mcmanus commented 7 years ago

@tojocky Wow, that seems too big. The largest I've seen is 1.4 GB for Chromium. Does changing some of the settings to reduce the size work for you? You can use files.exclude to remove directories and files that you don't care about having symbols for, and limitSymbolsToIncludedHeaders to true might help too, and setting addWorkspaceRootToIncludePath to false and then selectively adding the directories you actually want symbols for should help. You should also delete the database or change the databaseFilename after making these settings changes because the database doesn't self-clean and can accumulate junk from older settings (which we've been planning to fix for a while). This could also be a new bug due or due to symbolic link cycles, but we would need more info to tell.

tojocky commented 7 years ago

OK, I'm back to ~20GB

sean-mcmanus commented 7 years ago

@tojocky Can you provide more info? Do you think this is a bug? You should be able to workaround the issue via deleting the database file (or changing databaseFilename) after reducing the scope of the browse.path setting to not include so many files. Our database adds all the filenames it recursively detects from browse.path and then parses files for symbol information for files it believes are C/C++. So it's either finding too many files and/or parsing too many files. You could possibly help us diagnose the issue via opening the .browse.vc.db file with a SQLite viewer and looking for what's causing the size bloat. It also doesn't remove files from the database that no longer exist in the browse.path, requiring a manual deletion to clean up (an issue we are planning to fix in September).

tojocky commented 7 years ago

This time I used sqlite3_analyzer.exe to understand what is going on. Seems the table CODE_ITEMS with it indexes takes most of the space.

I ran the SQL command: "select count(*) from code_items;" and the result is: 110702994

/** Disk-Space Utilization Report For C:\Users\ion.lupascu\AppData\Roaming\Code\User\workspaceStorage\dc5891a1df997736f6106d3d0a76af58\ms-vscode.cpptools\.BROWSE.VC.DB

Page size in bytes................................ 4096      
Pages in the whole file (measured)................ 5213044   
Pages in the whole file (calculated).............. 5213043   
Pages that store data............................. 5213042    100.000% 
Pages on the freelist (per header)................ 1            0.0% 
Pages on the freelist (calculated)................ 2            0.0% 
Pages of auto-vacuum overhead..................... 0            0.0% 
Number of tables in the database.................. 15        
Number of indices................................. 37        
Number of defined indices......................... 30        
Number of implied indices......................... 7         
Size of the file in bytes......................... 21352628224
Bytes of user payload stored...................... 8825666906  41.3% 

*** Page counts for all tables with their indices *****************************

CODE_ITEMS........................................ 4808119     92.2% 
FILE_SIGNATURES................................... 163027       3.1% 
FILES............................................. 102832       2.0% 
ASSOC_TEXT........................................ 62247        1.2% 
ASSOC_SPANS....................................... 58479        1.1% 
BASE_CLASS_PARENTS................................ 18307        0.35% 
CONFIGS........................................... 5            0.0% 
FILE_MAP.......................................... 5            0.0% 
CONFIG_FILES...................................... 4            0.0% 
PROJECTS.......................................... 4            0.0% 
SQLITE_MASTER..................................... 4            0.0% 
SHARED_TEXT....................................... 3            0.0% 
CODE_ITEM_KINDS................................... 2            0.0% 
PARSERS........................................... 2            0.0% 
PROPERTIES........................................ 2            0.0% 

*** Page counts for all tables and indices separately *************************

CODE_ITEMS........................................ 2146534     41.2% 
IX_CODE_ITEMS_NAME................................ 558092      10.7% 
IX_CODE_ITEMS_PARENT_ID_KIND...................... 478439       9.2% 
SQLITE_AUTOINDEX_CODE_ITEMS_1..................... 428285       8.2% 
IX_CODE_ITEMS_PARENT_ID........................... 416587       8.0% 
IX_CODE_ITEMS_LOWER_NAME_HINT..................... 390802       7.5% 
IX_CODE_ITEMS_FILE_ID............................. 389380       7.5% 
FILE_SIGNATURES................................... 158169       3.0% 
FILES............................................. 53016        1.0% 
ASSOC_TEXT........................................ 44283        0.85% 
UQ_FILES_NAME..................................... 37322        0.72% 
ASSOC_SPANS....................................... 25118        0.48% 
UQ_ASSOC_SPANS_CODE_ITEM_ID_KIND.................. 17383        0.33% 
IX_ASSOC_SPANS_CODE_ITEM_ID....................... 15978        0.31% 
UQ_ASSOC_TEXT_CODE_ITEM_ID_KIND................... 9400         0.18% 
IX_FILES_LEAF_NAME................................ 8851         0.17% 
IX_ASSOC_TEXT_CODE_ITEM_ID........................ 8564         0.16% 
UQ_BASE_CLASS_PARENTS_BASE_CODE_ITEM_ID_PARENT_CODE_ITEM_ID 5577         0.11% 
BASE_CLASS_PARENTS................................ 4642         0.089% 
IX_BASE_CLASS_PARENTS_BASE_CODE_ITEM_ID........... 4044         0.078% 
IX_BASE_CLASS_PARENTS_PARENT_CODE_ITEM_ID......... 4044         0.078% 
SQLITE_AUTOINDEX_FILES_1.......................... 3643         0.070% 
UQ_FILE_SIGNATURES_FILE_ID_KIND................... 2533         0.049% 
IX_FILE_SIGNATURES_FILE_ID........................ 2325         0.045% 
SQLITE_MASTER..................................... 4            0.0% 
CODE_ITEM_KINDS................................... 1            0.0% 
CONFIG_FILES...................................... 1            0.0% 
CONFIGS........................................... 1            0.0% 
FILE_MAP.......................................... 1            0.0% 
IX_CONFIG_FILES_CONFIG_ID......................... 1            0.0% 
IX_CONFIG_FILES_FILE_ID........................... 1            0.0% 
IX_CONFIGS_NAME................................... 1            0.0% 
IX_CONFIGS_PROJECT_ID............................. 1            0.0% 
IX_FILE_MAP_CODE_ITEM_ID.......................... 1            0.0% 
IX_FILE_MAP_CONFIG_ID............................. 1            0.0% 
IX_FILE_MAP_FILE_ID............................... 1            0.0% 
IX_SHARED_TEXT_HASH............................... 1            0.0% 
PARSERS........................................... 1            0.0% 
PROJECTS.......................................... 1            0.0% 
PROPERTIES........................................ 1            0.0% 
SHARED_TEXT....................................... 1            0.0% 
SQLITE_AUTOINDEX_CONFIGS_1........................ 1            0.0% 
SQLITE_AUTOINDEX_PARSERS_1........................ 1            0.0% 
SQLITE_AUTOINDEX_PROJECTS_1....................... 1            0.0% 
SQLITE_AUTOINDEX_PROPERTIES_1..................... 1            0.0% 
SQLITE_AUTOINDEX_SHARED_TEXT_1.................... 1            0.0% 
UQ_CODE_ITEM_KINDS_NAME_PARSER_GUID............... 1            0.0% 
UQ_CONFIG_FILES_CONFIG_ID_FILE_ID................. 1            0.0% 
UQ_CONFIGS_PROJECT_ID_NAME........................ 1            0.0% 
UQ_FILE_MAP_CODE_ITEM_ID_CONFIG_ID_FILE_ID........ 1            0.0% 
UQ_PROJECTS_GUID.................................. 1            0.0% 
UQ_PROJECTS_NAME.................................. 1            0.0% 

I also ran the command "select * from code_items limit 50;" and I see things like:

"27"    "1" "0" "35"    "65538" "iomanip"   ""  "1" "31"    "18"    "31"    "0" "0" "0" "0" ""  "NULL"  "NULL"  "NULL"  "NULL"  "NULL"  "ioma"
"28"    "1" "0" "35"    "65538" "math.h"    ""  "1" "32"    "17"    "32"    "0" "0" "0" "0" ""  "NULL"  "NULL"  "NULL"  "NULL"  "NULL"  "math"
"29"    "1" "0" "35"    "65538" "algorithm" ""  "1" "33"    "20"    "33"    "0" "0" "0" "0" ""  "NULL"  "NULL"  "NULL"  "NULL"  "NULL"  "algo"

except my hpp files.

also I checked how many times a file is repeated by running: "select count(*) from code_items where name="iomanip";": 2098

Question: is this the # of lines?

Let me know if you need more info.

sean-mcmanus commented 7 years ago

@tojocky Code items are symbols. It looks like your code base has lots of symbols. Do you believe this is expected or does it seem like a bug to you? If non-C/C++ files are being incorrectly parsed due to a file association mapping, that might cause too many symbols to be generated. You could try using files.exclude to remove sections of your code base, which should cause the symbol to be removed. Is the 20 GB database a problem for you? Is performance slow or is it just hogging disk space?

tojocky commented 7 years ago

Hi @sean-mcmanus . Regarding performance I can't complain, Thank you for the great job. For C++ projects I wanted to use a modern IDE, but I'm fine with vim and sometime sublimetext. This project is a really huge.

The only issue is just hogging disk space.

I will consider to use files.exclude setting.

tojocky commented 7 years ago

BTW, instead of encoding file name in each code item isn't better to isolate into a separate table with a primary key? It will avoid a filename to be repeated 1000s of times plus the index also takes a lot of space.

A NoSQL DB would be better.

This is just what I'm thinking.

lxzh commented 6 years ago

1.Ctrl+P 2.Open c_cpp.properties.json 3.Edit the follow node:

"browse": {
                "path": [
                    "${workspaceFolder}",
                    "D:/Program Files/VS2017/VC/Tools/MSVC/14.11.25503/include/*",
                    "D:/Program Files/VS2017/VC/Tools/MSVC/14.11.25503/atlmfc/include/*",
                    "C:/Program Files (x86)/Windows Kits/10/Include/10.0.15063.0/um",
                    "C:/Program Files (x86)/Windows Kits/10/Include/10.0.15063.0/ucrt",
                    "C:/Program Files (x86)/Windows Kits/10/Include/10.0.15063.0/shared",
                    "C:/Program Files (x86)/Windows Kits/10/Include/10.0.15063.0/winrt"
                ],
                "limitSymbolsToIncludedHeaders": true,
                "databaseFilename": "D:/Others/VSCode/browse.vc.db"
            },

4.Change "databaseFilename"value to location where you want to store the browse.vc.db file.

sean-mcmanus commented 6 years ago

@ljf1239848066 What's the problem? How big is your file?

lxzh commented 6 years ago

@sean-mcmanus More than 20G.

sean-mcmanus commented 6 years ago

@ljf1239848066 Is your workspace really big? How may files are getting discovered/parsed? If your loggingLevel it high enough it should show that info in the C/C++ Output window.

lxzh commented 6 years ago

I'm working on aosp project with different branches, so i need to open several instance at the same time, totally nearly a million files. My reply is to figure out a solution to changed the .db file out of C disk to avoid lacking of space.