newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.52k stars 708 forks source link

filter-repo: skip over unexpected git-catfile output #602

Closed larsxschneider closed 1 month ago

larsxschneider commented 1 month ago

We have encountered, possibility corrupted, Git repositories that return unexpected git cat-file output lines:

 File "/git-filter-repo", line 4149, in <module>
    main()
  File "/git-filter-repo", line 4143, in main
    RepoAnalyze.run(args)
  File "/git-filter-repo", line 2723, in run
    stats = RepoAnalyze.gather_data(args)
  File "/git-filter-repo", line 2366, in gather_data
    unpacked_size, packed_size = GitUtils.get_blob_sizes()
  File "/git-filter-repo", line 1580, in get_blob_sizes
    sha, objtype, objsize, objdisksize = line.split()
ValueError: not enough values to unpack (expected 4, got 2)

Let's skip over those lines and print them to stderr for further analysis.


The script currently doesn't use lots of Python exception handling and I think we don't write to stderr at all yet. Please let me know if there is a better/more fitting way to make this change. I was contemplating if we should let the script die if we encounter such weird content but decided against it for now (mainly to see if there is one or many bad output lines).

larsxschneider commented 1 month ago

FYI: The output we got from the corrupt repo was:

Error: unexpected `git cat-file` output: "b'aabbccddd18bbdb5ff6e2b4cbce3299e4f389ee07f missing\n'"
newren commented 1 month ago

I merged this after making a small tweak (so that code coverage wouldn't yell about no test hitting this exception that we don't know how to trigger in practice) as 71a50a1b6153 (filter-repo: skip over unexpected git-catfile output, 2024-09-30).

Sorry for the long delay, but thanks for sending this in!