newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.1k stars 691 forks source link

`--dry-run` removes quotation marks from paths in `fast-export.filtered` #585

Open hisaac opened 1 month ago

hisaac commented 1 month ago

I'm using the --dry-run flag right now to get a sense of what changes git-filter-repo will make to my repository on certain commands. When run a git diff of the resulting fast-export.original and fast-export.filtered files, I noticed that quotation marks are being removed from paths that have spaces in them.

For example, running

git diff --no-index filter-repo/fast-export.original filter-repo/fast-export.filtered

results in the following output in the shell:

CleanShot 2024-07-16 at 15 18 01@2x

Is that expected behavior? Is there some way I can prevent these changes so that the diff is more useful?

hisaac commented 1 month ago

Oh and in case it matters, this is the filter-repo command I'm running (actual paths modified):

git-filter-repo --dry-run \
    --invert-paths \
    --path Path/To/A/File/CodeToRemove.swift \
    --path Path/To/A/File/CodeToRemoveTests.swift \
    --path Path/To/Another/File/MoreCodeToRemove.swift
newren commented 1 month ago

Does applying this diff do what you want?

diff --git a/git-filter-repo b/git-filter-repo
index 9cce52a..6c02d31 100755
--- a/git-filter-repo
+++ b/git-filter-repo
@@ -192,10 +192,10 @@ class PathQuoting:
   @staticmethod
   def enquote(unquoted_string):
     # Option 1: Quoting when fast-export would:
-    #    pqsc = PathQuoting._special_chars
-    #    if any(pqsc[x] for x in set(unquoted_string)):
+    pqsc = PathQuoting._special_chars
+    if any(pqsc[x] for x in set(unquoted_string)):
     # Option 2, perf hack: do minimal amount of quoting required by fast-import
-    if unquoted_string.startswith(b'"') or b'\n' in unquoted_string:
+    #if unquoted_string.startswith(b'"') or b'\n' in unquoted_string:
       pqe = PathQuoting._escape
       return b'"' + b''.join(pqe[x] for x in unquoted_string) + b'"'
     return unquoted_string
hisaac commented 1 month ago

Hmm, nope, no change to the diff when I use that patch.

Using this patch seems to fix it though:

diff --git a/git-filter-repo b/git-filter-repo
index 9cce52a..be3885c 100755
--- a/git-filter-repo
+++ b/git-filter-repo
@@ -195,7 +195,7 @@ class PathQuoting:
     #    pqsc = PathQuoting._special_chars
     #    if any(pqsc[x] for x in set(unquoted_string)):
     # Option 2, perf hack: do minimal amount of quoting required by fast-import
-    if unquoted_string.startswith(b'"') or b'\n' in unquoted_string:
+    if unquoted_string.startswith(b'"') or b'\n' in unquoted_string or b' ' in unquoted_string:
       pqe = PathQuoting._escape
       return b'"' + b''.join(pqe[x] for x in unquoted_string) + b'"'
     return unquoted_string

I don't know what the other implications of adding that would be though. What do you think?

newren commented 1 month ago

Oh, cool, glad you found the needed change. I'm reticent to apply it as-is, though, because it'll likely slow things down for the general case (this function is called quite a lot), and make the comment on the line above it a lie. A few possible paths forward:

Thoughts?

hisaac commented 3 weeks ago

The last option seems the best to me.

I unfortunately don't have the time to do the implementation myself. I'm happy to test things with my specific use-case though if you or someone else takes up the task.