src-d / borges

borges collects and stores Git repositories.
https://docs.sourced.tech/borges/
GNU General Public License v3.0
52 stars 20 forks source link

Create a tool to do borges housekeeping #369

Closed jfontan closed 5 years ago

jfontan commented 6 years ago

This tool will have utilities to help gather information from a borges deployment and fix the problems that cannot be done easily with the available tools. The issue describing the problems is #367.

The first version will only have the minimum commands needed to fix the current problems:

Also add a document showing the shell commands needed to generate the lists of bad siva files and the ones we want to move to another bucket size. Managing files should use libgfapi through go-billy (#368).

jfontan commented 6 years ago

Getting all the references from the database is not that slow nor takes that much memory:

[2018-11-17T16:24:49.636677548Z]  INFO still working counter=230000000 duration=1m55.073868471s partial=592.174013ms sivas=11630585
[2018-11-17T16:24:50.09454284Z]  INFO still working counter=231000000 duration=1m55.531738134s partial=457.756337ms sivas=11688918
[2018-11-17T16:24:50.557283417Z]  INFO still working counter=232000000 duration=1m55.994476685s partial=462.652278ms sivas=11746025
[2018-11-17T16:24:51.026517343Z]  INFO still working counter=233000000 duration=1m56.463711959s partial=469.148596ms sivas=11801192
[2018-11-17T16:24:51.46463446Z]  INFO finished getting results duration=1m56.9018166s memory=1142 total_memory=14957
[2018-11-17T16:25:02.865801818Z]  INFO finished preparing siva list duration=11.401063709s memory=1323 total_memory=15138
141.35user 17.09system 2:19.33elapsed 113%CPU (0avgtext+0avgdata 2030912maxresident)k
0inputs+0outputs (0major+306937minor)pagefaults 0swaps

Took 2:19 minutes and around 2Gb of ram. The total number of references is a bit more than 233 million and unique siva files is almos 12 million. The list of siva files is sorted before being printed.

The list is held in a map while getting it so the siva hashes are not repeated. They are stored as strings so we may get some improvement storing them as binary.

jfontan commented 5 years ago

Fixed in #367