Closed zergov closed 3 days ago
Instead of eagerly indexing repositories, and surfacing information from sqlite, we should interact with git directly.
Gitlab has a ton of documentation on their architecture:
git ls-files
To list all files tracked by git, we can use git ls-files. To get the line count: git ls-files | wc -l:
git ls-files | wc -l
... 10 tools/releaser/test/fixtures/activestorage/CHANGELOG.md 17 tools/releaser/test/fixtures/activestorage/lib/active_storage/gem_version.rb 38 tools/releaser/test/fixtures/activestorage/package.json 10 tools/releaser/test/fixtures/activesupport/CHANGELOG.md 10 tools/releaser/test/fixtures/guides/CHANGELOG.md 10 tools/releaser/test/fixtures/railties/CHANGELOG.md 233 tools/releaser/test/releaser_test.rb 6 tools/releaser/test/test_helper.rb 18 tools/test.rb 20 tools/test_common.rb 3165 yarn.lock
^ This is tricky though because we cannot get the line count of a specific revision without checking out that revision.
git ls-tree
We can use git ls-tree -l -r --name-only <revision>, which returns the files at a specific revision, and their size. Example:
git ls-tree -l -r --name-only <revision>
100644 blob 83f4069ffdba1732a30d34edd0757f1da8ef5998 415 railties/helpers/test_helper.rb 100644 blob edbc89bf991e038d2225a6984e071ad7c8c3eca7 129 railties/html/404.html 100644 blob ee0c919c4aed3ec532fb2cba928d812a0e405b31 212 railties/html/500.html 100644 blob 4949c64a5a461533dcf9d60f46b257486ceb197a 85 railties/html/index.html 100644 blob 64b945158c4ee345cdf2903921fd3e9d1ed926b9 2711 railties/lib/binding_of_caller.rb 100755 blob 6de2d64a7e0a847a68d5994efac1265b3562e5e0 16036 railties/lib/breakpoint.rb 100644 blob fa93c11f3ebcd3aad4713be0f17bbfea63505acf 4892 railties/lib/breakpoint_client.rb 100644 blob 825772fcf199f1231a8d278f69f51cd2881fd82f 3162 railties/lib/code_statistics.rb 100644 blob 7ae2affb457b7afce12d02702f32139b5e39de7c 3316 railties/lib/dispatcher.rb 100644 blob 83a404fc0afdd4668d1179e86689fc7302b3af84 6952 railties/lib/rails_generator.rb 100644 blob 3cb0db0a49f2cfbcddbd4902e4deecf267c552e7 4453 railties/lib/webrick_server.rb 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 railties/test/generators/missing_class/missing_class_generator.rb 100644 blob 3a699b04b24a940cd4135f56edaec81265a2cda0 3076 railties/test/rails_generator_test.rb 100644 blob 7f5be65f4b7ea599e3b1e100fa45b165a155843c 1132 railties/test/webrick_dispatcher_test.rb
git diff
git diff --numstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 <revision> 0 0 Backend/.dockerignore 54 0 Backend/.github/workflows/docker-image.yml 3 0 Backend/.gitignore 11 0 Backend/CHANGELOG.md 13 0 Backend/Dockerfile 59 0 Backend/README.md 0 0 Backend/api/__init__.py 9 0 Backend/api/config.py 27 0 Backend/api/main.py 0 0 Backend/api/routers/__init__.py 12 0 Backend/api/routers/commits_router.py 5 0 Backend/api/schemas/base_schema.py 32 0 Backend/api/schemas/common_schema.py 26 0 Backend/api/services/cache_service.py 51 0 Backend/api/services/file_service.py 87 0 Backend/api/services/pydriller_service.py
returns a diff of all the files at the with an empty directory. This essentially gives us a listing of all the files on the repository, and their line count.
git log --since=2015-10-01 --date=short --pretty=format:%ad | sort | uniq -c
➜ rails git:(main) git log --pretty=format:%an Gemfile.lock | sort | uniq -c | sort -r | head -5 139 Rafael Mendonça França 53 Ryuta Kamizono 50 Yasuo Honda 36 David Heinemeier Hansson 34 Xavier Noria
git log --follow --format=%ad --date default <FILE> | tail -1
Example of what I have in mind:
Requesting tree of directory: cache miss
Requesting tree of directory: cache hit
Starting shipping this in https://github.com/visevol/GihubVisualisation/pull/44, let's add stuff to it.
Instead of eagerly indexing repositories, and surfacing information from sqlite, we should interact with git directly.
Gitlab has a ton of documentation on their architecture:
Listing all files
Using
git ls-files
To list all files tracked by git, we can use
git ls-files
. To get the line count:git ls-files | wc -l
:^ This is tricky though because we cannot get the line count of a specific revision without checking out that revision.
using
git ls-tree
We can use
git ls-tree -l -r --name-only <revision>
, which returns the files at a specific revision, and their size. Example:using a combination of
git ls-tree
andgit diff
returns a diff of all the files at the with an empty directory.
This essentially gives us a listing of all the files on the repository, and their line count.
Number of commits per day
Main contributors on a file
Date at which a file was introduced