rmyorston / busybox-w32

WIN32 native port of BusyBox.
https://frippery.org/busybox
Other
677 stars 124 forks source link

Correct handling of diff output in text mode #348

Closed yindian closed 1 year ago

yindian commented 1 year ago

It seems that since the introduction of --binary flag to diff applet, the diff output has been incorrect for files with CRLF line endings, unless --binary is specified. For example, say there are two files abcd and abde with 4 lines of single character, the current output of busybox diff abcd abde is:

--- abcd
+++ abde
@@ -1,4 +1,4 @@
 a
 b
-c
-d
+
d+
e

while the supposed correct output is:

--- abcd
+++ abde
@@ -1,4 +1,4 @@
 a
 b
-c
 d
+e

It seems to be related to improper ft_pos value in text mode. I have drafted a patch which seems to work, listed below for your reference.

diff --git a/editors/diff.c b/editors/diff.c
index b6716455f..bb85d5bb7 100644
--- a/editors/diff.c
+++ b/editors/diff.c
@@ -231,7 +231,13 @@ static int read_token(FILE_and_pos_t *ft, token_t tok)
                tok |= (t & (TOK_EOF + TOK_EOL));
                /* Only EOL? */
                if (t == '\n')
+               {
+#if ENABLE_PLATFORM_MINGW32
+                       if (!(option_mask32 & FLAG(binary)))
+                               ft->ft_pos = ftello(ft->ft_fp);
+#endif
                        tok |= TOK_EOL;
+               }

                if (option_mask32 & FLAG(i)) /* Handcoded tolower() */
                        t = (t >= 'A' && t <= 'Z') ? t - ('A' - 'a') : t;
@@ -429,6 +435,14 @@ static void fetch(FILE_and_pos_t *ft, const off_t *ix, int a, int b, int ch)
                                return;
                        }
                        ft->ft_pos++;
+#if ENABLE_PLATFORM_MINGW32
+                       if (c == '\n' && !(option_mask32 & FLAG(binary)))
+                       {
+                               int d = (int) (ftello(ft->ft_fp) - ft->ft_pos);
+                               ft->ft_pos += d;
+                               j += d;
+                       }
+#endif
                        if (c == '\t' && (option_mask32 & FLAG(t)))
                                do putchar(' '); while (++col & 7);
                        else {
rmyorston commented 1 year ago

Unfortunately the suggested patch breaks diff without --binary for files with LF line endings:

~ $ diff abcd.lf  abde.lf
--- abcd.lf
+++ abde.lf
@@ -1,4 +1,4 @@
  a
b
-c
-d
+b
d
+e

Keeping track of the position in the file for both text mode and binary mode looks hard. I've gone back to using the default binary mode along with skipping the CR of CRLF line endings. It's not perfect but it's simple.

Try the most recent prerelease binary.

yindian commented 1 year ago

Thanks for your update. I have tried the latest prerelease binary and it works for both CRLF and LF input now.

It is true that my original patch does not work with LF line ending correctly, and I did not perform enough test before posting. It seems that ftello() would return unexpected result for LF-only text files, e.g. return -1 after 'a' and '\n' are read. Your solution is much better and could get around the crappy MS CRT issue with less footprint.