newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.52k stars 708 forks source link

Fix "Passed but got" error on CJK file names #438

Closed louy2 closed 4 months ago

louy2 commented 1 year ago

filter-repo callback passes unicode filename as utf_8 bytes, but git check-ignore prints unicode filename as quoted octal escaped utf_8 bytes, failing the name != pathname check on CJK filenames. .decode('unicode_escape') decodes latin-1 bytes with escaped unicode, so it decodes the escaped bytes, but into a latin-1 str, therefore .encode('latin_1') recovers the original bytes, which is utf_8, and is comparable to the filename passed by filter-repo callback.

newren commented 1 year ago

Thanks, but avoiding trouble with parsing special filenames would probably be better done by passing the "-z" option to check-ignore. If we do that, we would also need to split input paths with null characters rather than newline characters, and also split output on null characters rather than newline characters. Do you want to give that a shot?

newren commented 4 months ago

I implemented the alternative using the -z flag to check-ignore in commit 2800bcc1007e (clean-ignore: support utf-8 filenames found in .gitignore, 2024-07-02)