vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
982 stars 118 forks source link

Wrong newline characters (CR+CR+LF) on Windows #418

Open tueda opened 1 year ago

tueda commented 1 year ago

I don't yet fully understand it, but it seems that on Windows (without POSIX; UNIX is undefined and WINDOWS is defined, which means WITHRETURN is defined), FORM tries to put CR before LF for newline characters. For example: https://github.com/vermaseren/form/blob/e15da79ddec5dcb9432614e96854c7780496537c/sources/sch.c#L94-L97

But, in text mode, LF written to output is converted to CR+LF, and CR+LF from input is converted to LF:

In fact, stdout is opened with text mode:

while other files in FORM are always opened in binary mode:

$ rg Uopen
sources/form3.h
485:extern FILES *Uopen(char *,char *);
512:#define Uopen(x,y) fopen(x,y)

sources/tools.c
988:    if ( ( f = Uopen(name,"rb") ) == 0 ) return(-1);
1007:   if ( ( f = Uopen(name,"a+b") ) == 0 ) return(-1);
1028:   if ( ( f = Uopen(name,"r+b") ) == 0 ) return(-1);
1047:   if ( ( f = Uopen(name,"w+b") ) == 0 ) return(-1);
1064:   if ( ( f = Uopen(name,"w+b") ) == 0 ) return(-1);

sources/unixfile.c
64:     #[ Uopen :
67:FILES *Uopen(char *filename, char *mode)
69:     FILES *f = (FILES *)Malloc1(sizeof(FILES),"Uopen");
86:     M_free(f,"Uopen");
91:     #] Uopen :

Consequently, FORM outputs to stdout contain wrong newline characters CR+CR+LF on Windows. This breaks some test cases where newline characters do matter.

tueda commented 2 months ago

Here, I built Windows binaries with MSVC as well as MSYS2. Both give the same output. In the following, new-line characters are explicitly shown as \r -> <CR> and \n -> <LF>\n.

Input:

S x,y;<CR><LF>
L F = (x+y)^10;<CR><LF>
P;<CR><LF>
.end

FORM output to log file (-l)

FORM 5.0.0-beta.1 (Apr  2 2024, v5.0.0-beta.1-58-g88f6930)  Run: Thu Apr 25 19:54:50 2024<LF>
    S x,y;<LF>
    L F = (x+y)^10;<LF>
    P;<LF>
    .end<LF>
<CR><LF>
Time =       0.00 sec    Generated terms =         11<CR><LF>
               F         Terms in output =         11<CR><LF>
                         Bytes used      =        364<CR><LF>
<CR><LF>
   F =<CR><LF>
      y^10 + 10*x*y^9 + 45*x^2*y^8 + 120*x^3*y^7 + 210*x^4*y^6 + 252*x^5*y^5<CR><LF>
       + 210*x^6*y^4 + 120*x^7*y^3 + 45*x^8*y^2 + 10*x^9*y + x^10;<CR><LF>
<CR><LF>
  0.00 sec out of 0.00 sec<CR><LF>

FORM output to stdout

FORM 5.0.0-beta.1 (Apr  2 2024, v5.0.0-beta.1-58-g88f6930)  Run: Thu Apr 25 19:54:50 2024<CR><LF>
    S x,y;<CR><LF>
    L F = (x+y)^10;<CR><LF>
    P;<CR><LF>
    .end<CR><LF>
<CR><CR><LF>
Time =       0.00 sec    Generated terms =         11<CR><CR><LF>
               F         Terms in output =         11<CR><CR><LF>
                         Bytes used      =        364<CR><CR><LF>
<CR><LF>
   F =<CR><LF>
      y^10 + 10*x*y^9 + 45*x^2*y^8 + 120*x^3*y^7 + 210*x^4*y^6 + 252*x^5*y^5<CR><LF>
       + 210*x^6*y^4 + 120*x^7*y^3 + 45*x^8*y^2 + 10*x^9*y + x^10;<CR><LF>
<CR><LF>
  0.00 sec out of 0.00 sec<CR><CR><LF>
So, there are three combinations of new-line characters: log file (binary mode) stdout (text mode)
LF CR LF
CR LF CR CR LF
CR LF CR LF

where the last row is the correct pair.

tueda commented 1 month ago

If we set the mode of the standard output to binary mode by the _setmode function, then CR+CR+LF disappears. But LF without preceding CR remains.