visevol / GithubVisualisation

PFE028 Été 2024
MIT License
0 stars 1 forks source link

[Backend] Use Gitaly (or make something similar) to interact with Git repositories directly #43

Closed zergov closed 3 days ago

zergov commented 5 days ago

Instead of eagerly indexing repositories, and surfacing information from sqlite, we should interact with git directly.

Gitlab has a ton of documentation on their architecture:

Listing all files

Using git ls-files

To list all files tracked by git, we can use git ls-files. To get the line count: git ls-files | wc -l:

...
10 tools/releaser/test/fixtures/activestorage/CHANGELOG.md
17 tools/releaser/test/fixtures/activestorage/lib/active_storage/gem_version.rb
38 tools/releaser/test/fixtures/activestorage/package.json
10 tools/releaser/test/fixtures/activesupport/CHANGELOG.md
10 tools/releaser/test/fixtures/guides/CHANGELOG.md
10 tools/releaser/test/fixtures/railties/CHANGELOG.md
233 tools/releaser/test/releaser_test.rb
 6 tools/releaser/test/test_helper.rb
18 tools/test.rb
20 tools/test_common.rb
3165 yarn.lock

^ This is tricky though because we cannot get the line count of a specific revision without checking out that revision.

using git ls-tree

We can use git ls-tree -l -r --name-only <revision>, which returns the files at a specific revision, and their size. Example:

100644 blob 83f4069ffdba1732a30d34edd0757f1da8ef5998     415    railties/helpers/test_helper.rb
100644 blob edbc89bf991e038d2225a6984e071ad7c8c3eca7     129    railties/html/404.html
100644 blob ee0c919c4aed3ec532fb2cba928d812a0e405b31     212    railties/html/500.html
100644 blob 4949c64a5a461533dcf9d60f46b257486ceb197a      85    railties/html/index.html
100644 blob 64b945158c4ee345cdf2903921fd3e9d1ed926b9    2711    railties/lib/binding_of_caller.rb
100755 blob 6de2d64a7e0a847a68d5994efac1265b3562e5e0   16036    railties/lib/breakpoint.rb
100644 blob fa93c11f3ebcd3aad4713be0f17bbfea63505acf    4892    railties/lib/breakpoint_client.rb
100644 blob 825772fcf199f1231a8d278f69f51cd2881fd82f    3162    railties/lib/code_statistics.rb
100644 blob 7ae2affb457b7afce12d02702f32139b5e39de7c    3316    railties/lib/dispatcher.rb
100644 blob 83a404fc0afdd4668d1179e86689fc7302b3af84    6952    railties/lib/rails_generator.rb
100644 blob 3cb0db0a49f2cfbcddbd4902e4deecf267c552e7    4453    railties/lib/webrick_server.rb
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391       0    railties/test/generators/missing_class/missing_class_generator.rb
100644 blob 3a699b04b24a940cd4135f56edaec81265a2cda0    3076    railties/test/rails_generator_test.rb
100644 blob 7f5be65f4b7ea599e3b1e100fa45b165a155843c    1132    railties/test/webrick_dispatcher_test.rb

using a combination of git ls-tree and git diff

git diff --numstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 <revision>
0       0       Backend/.dockerignore
54      0       Backend/.github/workflows/docker-image.yml
3       0       Backend/.gitignore
11      0       Backend/CHANGELOG.md
13      0       Backend/Dockerfile
59      0       Backend/README.md
0       0       Backend/api/__init__.py
9       0       Backend/api/config.py
27      0       Backend/api/main.py
0       0       Backend/api/routers/__init__.py
12      0       Backend/api/routers/commits_router.py
5       0       Backend/api/schemas/base_schema.py
32      0       Backend/api/schemas/common_schema.py
26      0       Backend/api/services/cache_service.py
51      0       Backend/api/services/file_service.py
87      0       Backend/api/services/pydriller_service.py

returns a diff of all the files at the with an empty directory. This essentially gives us a listing of all the files on the repository, and their line count.

Number of commits per day

git log --since=2015-10-01 --date=short --pretty=format:%ad | sort | uniq -c

Main contributors on a file

➜  rails git:(main) git log --pretty=format:%an Gemfile.lock | sort | uniq -c | sort -r | head -5
    139 Rafael Mendonça França
     53 Ryuta Kamizono
     50 Yasuo Honda
     36 David Heinemeier Hansson
     34 Xavier Noria

Date at which a file was introduced

git log --follow --format=%ad --date default <FILE> | tail -1
zergov commented 4 days ago

Example of what I have in mind:

Requesting tree of directory: cache miss

Requesting tree of directory: cache hit

zergov commented 3 days ago

Starting shipping this in https://github.com/visevol/GihubVisualisation/pull/44, let's add stuff to it.