newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.58k stars 710 forks source link

fast_import_crash #342

Closed streetvibration closed 2 years ago

streetvibration commented 2 years ago

We are filtering lots of repos which all runs well.

But with some repos we are running into this message:

Parsed 5696 commitsfatal: cannot truncate pack to skip duplicate: Keine Berechtigung fast-import: dumping crash report to .git/fast_import_crash_249381 Traceback (most recent call last): File "/mig/git-filter-repo", line 4004, in main() File "/mig/git-filter-repo", line 4001, in main filter.run() File "/mig/git-filter-repo", line 3936, in run self._parser.run(self._input, self._output) File "/mig/git-filter-repo", line 1405, in run self._parse_blob() File "/mig/git-filter-repo", line 1131, in _parse_blob blob.dump(self.output) File "/mig/git-filter-repo", line 531, in dump file.write(b'data %d\n%s' % (len(self.data), self.data)) BrokenPipeError: [Errno 32] Broken pipe

Please advice what's going wrong here. Many thanks in advance, Joe

newren commented 2 years ago

This appears to me to be a bug in git-fast-import, and would be something that would be nice to report to the Git mailing list. However, it'd be nice to gather some more data first. First, is Google Translate correct to change "Keine Berechtigung" to "No Authorization"? That message seems a bit odd to me -- are you on a special filesystem where truncation is disallowed or something? Is there any chance you could post the contents of .git/fast_import_crash_249381 ? (Check it for sensitive data first, just in case.)

Also, is there any chance the repository in question could be shared? Can you list the filter-repo command you used when you saw this?

Clearly from the fast-import error, you have a repository where you either have different branches that introduce the exact same file contents or else you have some kind of undoing of changes to a file so that you get one with the exact same contents as before. That should be fine, but it looks like the code has some optimization to jettison the second copy from the packfile it is trying to create, and that jettison operation is failing for some reason. That's not enough info for me to reproduce, but any extra detail you might be able to gather to make it easier to reproduce or hint at how it gets triggered would be helpful.

streetvibration commented 2 years ago

Hey newren, thanks a lot for your superfast reply.

Here comes some more information:

you are right, "Keine Berechtigung" means No Permission, No Authorizatsion.

Information to truncate:

truncate (GNU coreutils) 8.25 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

Written by Pdraig Brady.

So, i think it is not diallowed. And, we filtered a lot of other repos during a migration from SVN to GIT with no problems. Does every run of the filter-repo doing this truncate?

This is how we call the git-filter-repo for all repos from a main script. The vars are checked and correct.:

git filter-repo --invert-paths \ --paths-from-file ${PATH_TRAVELLER_REPO_FAREDODGERS_RULES} \ --source ${PATH_LOCAL_BASE_GIT} \ --target ${PATH_TRAVELLER_REPO_LOCAL_FILTERED_GIT} \ --force

rules in file ${PATH_TRAVELLER_REPO_FAREDODGERS_RULES}:

glob:*.jar
glob:*.tar
glob:*.zip

Unfortunately I cannot share the repo because of sensitive data.

Here comes the beginning of the report - I x-ed out sensitive data and replaced paths with

fast-import crash report: fast-import process: 162327 parent process : 162326 at 2022-02-15 13:35:46 +0000

fatal: cannot truncate pack to skip duplicate: Keine Berechtigung

Most Recent Commands Before Crash

get-mark :68007 blob mark :68008 data 16967 ls :68007 /Kartenauswahl.js get-mark :68008 commit refs/heads/master mark :68009 author xxx xxx@xxx.de 1583317764 +0000 committer xxx xxx@xxx.de 1583317764 +0000 data 56 from :68007 M 100644 :68008 /Kartenauswahl.js

get-mark :68009 blob mark :68010 data 40230 ls :68009 VertragAnlegenService.java get-mark :68010 commit refs/heads/master mark :68011 author xxx xxx@xxx.de 1583326794 +0000 committer xxx xxx@xxx.de 1583326794 +0000 data 42 from :68009 M 100644 :68010 ertragAnlegenService.java get-mark :68011 blob mark :68012 data 27030 blob mark :68013 data 17941 blob mark :68014 data 143828 ls :68011 _Fragenkatalog_Control.js get-mark :68012 commit refs/heads/master mark :68015 author xxx xxx@xxx.de 1583330926 +0000 committer xxx xxx@xxx.de 1583330926 +0000 data 72 from :68011 M 100644 :68012 Fragenkatalog_Control.js M 100644 :68013 Fragenkatalog_Texte.js M 100644 :68014 Fragenkatalog_View.js

get-mark :68015 blob mark :68016 data 11498 ls :68015 Entsperrkartenauswahl_View.js get-mark :68016 commit refs/heads/master mark :68017 author xxx xxx@xxx.de 1583331518 +0000 committer xxx xxx@xxx.de 1583331518 +0000 data 58 from :68015 M 100644 :68016 Entsperrkartenauswahl_View.js get-mark :68017 blob mark :68018 data 10975 ls :68017 crossselling.json get-mark :68018 commit refs/heads/master mark :68019 author xxx xxx@xxx.de 1583331564 +0000 committer xxx xxx@xxx.de 1583331564 +0000 data 36 from :68017 M 100644 :68018 crossselling.json

get-mark :68019 blob mark :68020 data 16891 blob mark :68021 data 4336 blob mark :68022 data 165060 blob mark :68023 data 165088 blob mark :68024 data 5764 blob mark :68025 data 8544 blob mark :68026

Active Branch LRU

active_branches = 1 cur, 5 max

pos clock name


   1)   5357 refs/heads/master

Inactive Branches
-----------------
refs/heads/master:
  status      : active loaded
  tip commit  : 528b9468969b055171a61ca723311e63e45613a3
  old tree    : 759c39fb35fb16fc7905160fca0149f8c9274c10
  cur tree    : 759c39fb35fb16fc7905160fca0149f8c9274c10
  commit clock: 5357
  last pack   : 0

Marks
-----
:1 b04b3501f5efd94313942eb7439457bc82f5a2f5
:2 b19e7b7a63e8e90cdb49c43f02035646c4a76e0a

...

:68020 c18e572077961dd5735461b8f7cbd0125b01345e
:68021 ffb4f7086a5a3de6554d4d1b90088064ac71b91f
:68022 a686119db88b1f7eaf9881332af7e8291a3d6ef1
:68023 526ac24a16db0157dbf93369b251139344412f98
:68024 741540dc990eb8571efa40f03a1d0a634845eb5f
:68025 d341427feef3f5cbba0cab2c119a6840901b6ad3
:68026 cec0386a4ee43eb181fd3abbdb1c963230414da9

-------------------
END OF CRASH REPORT

I hope this additional Info can help a bit further.
Many thanks again,
Joe
streetvibration commented 2 years ago

The crash was triggered by a very old filesystem on the server. But thanks again for your effort. So i close the issue.

newren commented 2 years ago

Thanks for following up and reporting back. I admit, I wasn't sure how to take the next steps in debugging, so learning that it might be a filesystem issue is helpful. So I take it that you were able to do the filtering by cloning the repository to a different filesystem and doing the filtering there?

azmodude commented 2 years ago

Hi Elijah,

let me chime in on this as I was mainly involved in solving the issue.

The 'Permission denied' error was triggered by running git-filter-repo on a VxFS Veritas filesystem. Looks like VxFS does not - for w/e reason, it's working just fine on 'normal' filesystems like ext4 and zfs which I tested on locally - like calling ftruncate() on a pack file in Git's hashfile_truncate() when the files are set to 0444 (which, as I understand, they usually are).

We solved this by setting the default permissions to 0644 in obd_mkstemp() and recompiling Git, which sets the to-be-truncated pack files to be at least writable by the current user. As doing the whole filtering process is just an intermediate step in our migration from Subversion to Git, I deemed this an okay step to take. :grinning:

BTW: Thanks for your great work on git-filter-repo, it's helping us a lot in migrating about 1k repositories and filtering an epic ton of binary files in the process.

newren commented 2 years ago

Awesome, thanks for the extra details. Filed a report over here: https://lore.kernel.org/git/CABPp-BERVCynOVvBq0QL49Ah+gy3W2snUVWBHfzXaVpXX3Dpyg@mail.gmail.com/