Open andinus opened 1 year ago
@andinus I am assuming that you've done a shallow clone?!
If you have and you're still suffering from performance issues, I can submit a patch (PR) for this issue. The following is what I have in mind:
a simple script that runs:
git repack && git prune-packed && git reflog expire --expire=1.month.ago && git gc --aggressive
Add it to a GH workflow that crons
it every week.
Thoughts @manwar
P.S: @andinus if upstream does not want to the proposed PR. Please note, that you can do this to your local clone.
I've just seen that the scripts
directory has attempted this already. So the solution may not be upstream.
I am also in favour of doing some house keeping. I use zsh with some git integration and the meanwhile 90k files slow down the shell. Can the "historic" commits maybe automatically be squashed, so we have perhaps only a single commit per week on the master?
@andinus I think your recommendation is the best and quick approach (i.e. deleting stale dirs w/ README
files). I ran a test locally and this is what i got:
Before I ran cleanup-readme-only.sh
╔ eax@nix:test_perlweeklychallenge-club(issue/7358)
╚ λ time gs
Refresh index: 100% (88731/88731), done.
On branch issue/7358
Untracked files:
(use "git add <file>..." to include in what will be committed)
script/cleanup_readme_only
nothing added to commit but untracked files present (use "git add" to track)
real 0m3.350s
user 0m1.562s
sys 0m2.123s
Running clean-up-readme-only.sh
This is how long it took the shell script took to run. However, this may be just be a one time only since it deleted the entirety of the repo's history.
╔ eax@nix:test_perlweeklychallenge-club(issue/7358)
╚ λ time bash -c script/cleanup_readme_only
real 2m1.530s
user 4m33.764s
sys 3m12.803s
It got rid of 39k files (see below), but we could do better.
╔ eax@nix:test_perlweeklychallenge-club(issue/7358)
╚ λ git diff --name-only HEAD~ | wc -l
39066
Doing git status
after runningclean-up-readme-only.sh
Untracked files:
(use "git add <file>..." to include in what will be committed)
script/cleanup_readme_only
no changes added to commit (use "git add" and/or "git commit -a")
real 0m1.082s
user 0m0.602s
sys 0m0.817s
Great improvement, but the script is too slow (even with xargs
). So I rewrote it in Go! See below speed improvement.
╔ eax@nix:test_perlweeklychallenge-club(issue/7358)
╚ λ time bin/cleanup
real 0m2.658s
user 0m2.780s
sys 0m5.675s
Night and day!!!
@manwar : let me know if this is a desirable action, and I'll submit the PR (all the code and local tests are complete). See below GH Action workflow:
name: Cleanup Readmes From Repository
on:
schedule:
- cron: '0 0 * * 0' # Run at midnight every Sunday
jobs:
cleanup:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Setup Go
uses: actions/setup-go@v2
with:
go-version: 1.17
- name: Build Go Script
run: go build -o bin/cleanup bin/main.go
- name: Execute Cleanup
run: ./bin/cleanup
I am assuming that you've done a shallow clone?!
git repack && git prune-packed && git reflog expire --expire=1.month.ago && git gc --aggressive
IIRC even after a shallow clone, running this ^, it was slow. @ealvar3z Can you share the script? I'll try running that and report back.
@andinus
Please be advised that I ran this on a separate repo: cp -r perlweeklychallenge-club/ test_perlweeklychallenge-club
Here's main.go
:
package main
import (
"fmt"
"os"
"path/filepath"
"runtime"
"sync"
)
func isReadmeOnly(dir string) bool {
files, _ := os.ReadDir(dir)
if len(files) == 1 && (files[0].Name() == "README" || files[0].Name() == "README.md") {
return true
}
return false
}
func cleanupReadmeOnly(wg *sync.WaitGroup, pathChan <-chan string) {
defer wg.Done()
for path := range pathChan {
if isReadmeOnly(path) {
os.RemoveAll(path)
}
}
}
func main() {
var wg sync.WaitGroup
ncores := runtime.NumCPU()
pathChan := make(chan string)
for i := 0; i < ncores; i++ {
wg.Add(1)
go cleanupReadmeOnly(&wg, pathChan)
}
err := filepath.WalkDir(".", func(path string, d os.DirEntry, err error) error {
if d.IsDir() {
pathChan <- path
}
return nil
})
if err != nil {
fmt.Println("Error:", err)
}
close(pathChan)
wg.Wait()
}
And the bash
script:
#!/bin/bash
cleanup_readme_only() {
num_cores=$(nproc)
find . -type d -print0 | xargs -0 -I {} -P "$num_cores" bash -c \
'if [ "$(ls -A {})" = "README" ] || [ "$(ls -A {})" = "README.md" ]; \
then rm -rf {}; fi'
}
cleanup_readme_only
It does improve performance, previous these took 71, 16 seconds. Takes about 8, 4 seconds now.
andinus@~/d/o/C/perlweeklychallenge-club (master)> time git status > /dev/null
Refresh index: 100% (93480/93480), done.
________________________________________________________
Executed in 8.44 secs fish external
usr time 1.65 secs 0.00 micros 1.65 secs
sys time 14.08 secs 0.00 micros 14.08 secs
andinus@~/d/o/C/perlweeklychallenge-club (master)> time git status -uno > /dev/null
Refresh index: 100% (93480/93480), done.
________________________________________________________
Executed in 4.34 secs fish external
usr time 1.01 secs 0.00 micros 1.01 secs
sys time 10.64 secs 0.00 micros 10.64 secs
Maybe this issue depends on the workflow in use. In my setup I don't experience such performance issues.
I'm operating on three branches in my fork of perlweeklychallenge-club:
master
is pull-only and for safety I use --ff-only
for pull
.contrib
is push-only. Using --ff-only
for merge
to synchronize with own master
mirror.ch-xxx
is a local working branch created from contrib
.Synchronize master
and contrib
from upstream, then create a new branch ch-xxx
from contrib, build the solution therein, merge ch-xxx
into contrib
, push to github and create a pull request from the contrib
branch.
Delete ch-xxx
after it has been merged into master
(and is finalized).
Updates are always fast-forward / incremental this way.
Currently there are over 70,000+ files in this repository and every week we're adding 100s of files (every week a directory is created for every user and the previous "README" is copied).
I started participating with
challenge-076
. According to my records I've submitted solutions for 25 challenges, so there are ~100 useless directories with my name and a README file. With around 300 users, I believe this adds up.My primary machine is not very fast and it takes
70 seconds
to rungit status
.