src-d / borges

borges collects and stores Git repositories.
https://docs.sourced.tech/borges/
GNU General Public License v3.0
52 stars 20 forks source link

Calculate the overhead of rooted repositories #388

Closed jfontan closed 5 years ago

jfontan commented 5 years ago

When a repository has several forks the siva file gets appended with new information. Calculate the overhead that this has on the size of the siva files. Compare the size of a multiple fork siva size with one after garbage collection and freshly created siva file (only one index).

mcarmonaa commented 5 years ago

I've analyzed several repositories with different sizes and number of forks, and extracted the following information:

Analyzed repositories

source+forks_used: 21 repositories
bare_source_size: 66MB
rooted_siva: 70958137(68MB)
fresh_siva: 70909808(68MB)
fresh_gc_siva: 70327170(67MB)
diff: 48329(47KB) (fresh is 0.07% smaller than rooted)
diff_gc: 630967(616KB) (fresh_gc is 0.89% smaller than rooted)
source+forks_used: 370 repositories
bare_source_size: 3.4MB
rooted_siva: 148507161(142MB)
fresh_siva: 135983599(130MB)
fresh_gc_siva: 56072071(54MB)
diff: 12523562(11MB) (fresh is 8.43% smaller than rooted)
diff_gc: 92435090(88MB) (fresh_gc is 62.24% smaller than rooted)
source+forks_used: 401 repositories
bare_source_size: 100MB
rooted_siva: 271725176(260MB)
fresh_siva: 256210322(245MB)
fresh_gc_siva: 235811168(225MB)
diff: 15514854(14MB) (fresh is 5.71% smaller than rooted)
diff_gc: 35914008(34MB) (fresh_gc is 13.28% smaller than rooted)

source+forks_used: 3191 repositories
bare_source_size: 92KB
rooted_siva: 970709956(926MB)
fresh_siva: 2387259(2.3MB)
fresh_gc_siva: 2725856(2.6MB)
diff: 968322697(924MB) (fresh is 99.75% smaller than rooted)
diff_gc: 967984100(923MB) (fresh_gc is 99.72% smaller than rooted)