newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.55k stars 708 forks source link

Error while writing results in 1GB repository size #499

Closed dimaslanjaka closed 6 hours ago

dimaslanjaka commented 1 year ago

I just run

git filter-repo --analyze

caught errors

Writing reports to .git\filter-repo\analysis...Traceback (most recent call last):
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\Scripts\git-filter-repo.exe\__main__.py", line 7, in <module>
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\site-packages\git_filter_repo.py", line 3999, in main
    RepoAnalyze.run(args)
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\site-packages\git_filter_repo.py", line 2689, in run
    RepoAnalyze.write_report(reportdir, stats)
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\site-packages\git_filter_repo.py", line 2431, in write_report
    size = {'packed': stats['packed_size'][sha],
KeyError: b'2e9038bce62c4fdfcd855549ce5bd068802a36b2'
newren commented 1 year ago

Is the repository you ran this on available for me to clone? Hard to debug without a way to reproduce...

dimaslanjaka commented 1 year ago

Is the repository you ran this on available for me to clone? Hard to debug without a way to reproduce...

I tried on these repo

newren commented 3 months ago

I cannot duplicate.

I note the the sizes of the repos when I clone them are:

$ du -hs */.git/objects | tac
981M    static-blog-generator-hexo/.git/objects
2.1G    static-blog-generator/.git/objects
291M    dimaslanjaka.github.io/.git/objects

So one is nearly a 1GB, and another is more than double that amount.

I instrumented git-filter-repo with the following changes to get an idea of the memory usage:

diff --git a/git-filter-repo b/git-filter-repo
index 9cce52a..a5bd003 100755
--- a/git-filter-repo
+++ b/git-filter-repo
@@ -38,6 +38,7 @@ import io
 import os
 import platform
 import re
+import resource
 import shutil
 import subprocess
 import sys
@@ -342,7 +343,10 @@ class ProgressWriter(object):
     now = time.time()
     if now - self._last_progress_update > .1:
       self._last_progress_update = now
-      sys.stdout.write("\r{}".format(msg))
+      mem = [resource.getrusage(resource.RUSAGE_SELF).ru_maxrss,
+             resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss]
+      sys.stdout.write("Self: %d Kb, Children: %d Kb\n" % tuple(mem))
+      #sys.stdout.write("\r{}".format(msg))
       sys.stdout.flush()

   def finish(self):
@@ -4135,6 +4139,11 @@ def main():
   else:
     filter = RepoFilter(args)
     filter.run()
+  sys.stdout.write("Final:\n")
+  mem = [resource.getrusage(resource.RUSAGE_SELF).ru_maxrss,
+         resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss]
+  sys.stdout.write("Self: %d Kb, Children: %d Kb\n" % tuple(mem))
+  sys.stdout.flush()

 if __name__ == '__main__':
   main()

and then when I ran I saw:

$ cd static-blog-generator-hexo/
$ /usr/bin/time -f "External monitoring: Memory: %M Kbytes, Time: %e seconds" git filter-repo --analyze
Self: 21448 Kb, Children: 21448 Kb
Self: 21448 Kb, Children: 21448 Kb
Self: 21448 Kb, Children: 21448 Kb
Self: 21576 Kb, Children: 21448 Kb
Self: 21576 Kb, Children: 21448 Kb
Self: 21576 Kb, Children: 21448 Kb
Self: 21832 Kb, Children: 21448 Kb
Self: 21832 Kb, Children: 21448 Kb
Self: 21960 Kb, Children: 21448 Kb
Self: 21960 Kb, Children: 21448 Kb
Self: 22344 Kb, Children: 21448 Kb
Self: 22344 Kb, Children: 21448 Kb
Self: 23488 Kb, Children: 21448 Kb
Self: 24000 Kb, Children: 171188 Kb

Self: 24512 Kb, Children: 171188 Kb
Self: 25792 Kb, Children: 171188 Kb
Self: 27712 Kb, Children: 171188 Kb
Self: 27968 Kb, Children: 171188 Kb
Self: 28992 Kb, Children: 171188 Kb
Self: 33472 Kb, Children: 171188 Kb
Self: 36288 Kb, Children: 171188 Kb
Self: 36288 Kb, Children: 171188 Kb
Self: 36288 Kb, Children: 171188 Kb
Self: 36416 Kb, Children: 171188 Kb
Self: 36416 Kb, Children: 171188 Kb
Self: 36416 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb

Writing reports to .git/filter-repo/analysis...done.
Final:
Self: 40512 Kb, Children: 171188 Kb
External monitoring: Memory: 171188 Kbytes, Time: 6.57 seconds

$ cd ../static-blog-generator
$ /usr/bin/time -f "External monitoring: Memory: %M Kbytes, Time: %e seconds" git filter-repo --analyze
Self: 22328 Kb, Children: 21432 Kb
Self: 24352 Kb, Children: 21432 Kb
Self: 26936 Kb, Children: 21432 Kb
Self: 32844 Kb, Children: 21432 Kb
Self: 33868 Kb, Children: 21432 Kb
Self: 35660 Kb, Children: 250684 Kb

Self: 35788 Kb, Children: 250684 Kb
Self: 36940 Kb, Children: 250684 Kb
Self: 37068 Kb, Children: 250684 Kb
Self: 45644 Kb, Children: 250684 Kb
Self: 47820 Kb, Children: 250684 Kb
Self: 48716 Kb, Children: 250684 Kb
Self: 48716 Kb, Children: 250684 Kb
Self: 48716 Kb, Children: 250684 Kb
Self: 48716 Kb, Children: 250684 Kb
Self: 49100 Kb, Children: 250684 Kb
Self: 49100 Kb, Children: 250684 Kb
Self: 55244 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 61852 Kb, Children: 250684 Kb
Self: 62108 Kb, Children: 250684 Kb
Self: 62236 Kb, Children: 250684 Kb
Self: 62236 Kb, Children: 250684 Kb
Self: 62236 Kb, Children: 250684 Kb
Self: 63516 Kb, Children: 250684 Kb
Self: 63516 Kb, Children: 250684 Kb
Self: 63516 Kb, Children: 250684 Kb
Self: 67868 Kb, Children: 250684 Kb
Self: 67868 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 73116 Kb, Children: 250684 Kb
Self: 73116 Kb, Children: 250684 Kb
Self: 73116 Kb, Children: 250684 Kb
Self: 73116 Kb, Children: 250684 Kb
Self: 73600 Kb, Children: 250684 Kb
Self: 73856 Kb, Children: 250684 Kb
Self: 74112 Kb, Children: 250684 Kb
Self: 75648 Kb, Children: 250684 Kb
Self: 76032 Kb, Children: 250684 Kb
Self: 76160 Kb, Children: 250684 Kb
Self: 76544 Kb, Children: 250684 Kb
Self: 76544 Kb, Children: 250684 Kb
Self: 76544 Kb, Children: 250684 Kb
Self: 76800 Kb, Children: 250684 Kb
Self: 77312 Kb, Children: 250684 Kb
Self: 77312 Kb, Children: 250684 Kb
Self: 77568 Kb, Children: 250684 Kb
Self: 77952 Kb, Children: 250684 Kb
Self: 77952 Kb, Children: 250684 Kb
Self: 77952 Kb, Children: 250684 Kb
Self: 77952 Kb, Children: 250684 Kb
Self: 78592 Kb, Children: 250684 Kb
Self: 79232 Kb, Children: 250684 Kb
Self: 79232 Kb, Children: 250684 Kb
Self: 79360 Kb, Children: 250684 Kb
Self: 80256 Kb, Children: 250684 Kb
Self: 80256 Kb, Children: 250684 Kb
Self: 80256 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81536 Kb, Children: 250684 Kb
Self: 94208 Kb, Children: 250684 Kb
Self: 105728 Kb, Children: 250684 Kb
Self: 105728 Kb, Children: 250684 Kb
Self: 105728 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb

Writing reports to .git/filter-repo/analysis...done.
Final:
Self: 119220 Kb, Children: 734388 Kb
External monitoring: Memory: 734388 Kbytes, Time: 26.30 seconds

$ cd ../dimaslanjaka.github.io/
$ /usr/bin/time -f "External monitoring: Memory: %M Kbytes, Time: %e seconds" git filter-repo --analyze
Self: 22304 Kb, Children: 21280 Kb
Self: 23880 Kb, Children: 21280 Kb
Self: 26852 Kb, Children: 21280 Kb
Self: 27888 Kb, Children: 21280 Kb
Self: 32972 Kb, Children: 21280 Kb
Self: 33100 Kb, Children: 21280 Kb
Self: 34508 Kb, Children: 21280 Kb
Self: 35660 Kb, Children: 21280 Kb
Self: 36940 Kb, Children: 21280 Kb
Self: 45064 Kb, Children: 21280 Kb
Self: 45064 Kb, Children: 21280 Kb
Self: 45704 Kb, Children: 21280 Kb
Self: 46984 Kb, Children: 21280 Kb
Self: 48264 Kb, Children: 21280 Kb
Self: 49544 Kb, Children: 21280 Kb
Self: 51208 Kb, Children: 21280 Kb
Self: 51208 Kb, Children: 244668 Kb

Self: 57352 Kb, Children: 244668 Kb
Self: 58760 Kb, Children: 244668 Kb
Self: 61064 Kb, Children: 244668 Kb
Self: 64904 Kb, Children: 244668 Kb
Self: 66952 Kb, Children: 244668 Kb
Self: 69384 Kb, Children: 244668 Kb
Self: 75528 Kb, Children: 244668 Kb
Self: 75912 Kb, Children: 244668 Kb
Self: 76808 Kb, Children: 244668 Kb
Self: 78472 Kb, Children: 244668 Kb
Self: 80648 Kb, Children: 244668 Kb
Self: 82824 Kb, Children: 244668 Kb
Self: 84232 Kb, Children: 244668 Kb
Self: 86664 Kb, Children: 244668 Kb
Self: 88456 Kb, Children: 244668 Kb
Self: 90888 Kb, Children: 244668 Kb
Self: 99080 Kb, Children: 244668 Kb
Self: 100488 Kb, Children: 244668 Kb
Self: 101768 Kb, Children: 244668 Kb
Self: 102536 Kb, Children: 244668 Kb
Self: 104072 Kb, Children: 244668 Kb
Self: 104840 Kb, Children: 244668 Kb
Self: 105608 Kb, Children: 244668 Kb
Self: 106504 Kb, Children: 244668 Kb
Self: 107400 Kb, Children: 244668 Kb
Self: 107784 Kb, Children: 244668 Kb
Self: 108040 Kb, Children: 244668 Kb
Self: 108680 Kb, Children: 244668 Kb
Self: 110344 Kb, Children: 244668 Kb
Self: 112264 Kb, Children: 244668 Kb
Self: 113544 Kb, Children: 244668 Kb
Self: 114312 Kb, Children: 244668 Kb
Self: 114952 Kb, Children: 244668 Kb
Self: 115336 Kb, Children: 244668 Kb
Self: 115592 Kb, Children: 244668 Kb
Self: 116488 Kb, Children: 244668 Kb
Self: 117768 Kb, Children: 244668 Kb
Self: 119688 Kb, Children: 244668 Kb
Self: 120456 Kb, Children: 244668 Kb
Self: 121352 Kb, Children: 244668 Kb
Self: 122376 Kb, Children: 244668 Kb
Self: 122376 Kb, Children: 244668 Kb

Writing reports to .git/filter-repo/analysis...done.
Final:
Self: 141832 Kb, Children: 244668 Kb
External monitoring: Memory: 244668 Kbytes, Time: 10.73 seconds

So, all the filterings finish in less than half a minute, and in all cases the processes use than 1GB of RAM. Do you still get the bug today with these repositories? Can you try on a system with more memory and/or on a different operating system and see if you can still reproduce the error?

newren commented 6 hours ago

I'll go ahead and close out; sorry for taking so long to find time to investigate. If you can reproduce, feel free to reopen and point out what is required to reproduce the bug.