vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
982 stars 118 forks source link

PolyRatFun Division by zero during normalization #336

Closed jodavies closed 3 months ago

jodavies commented 4 years ago

Related: #178 , possibly #270.

There are still division by zero problems when using PolyRatFun, in both form and tform (#270 seems to be a tform-only problem).

I don't have a minimal example currently, terms which cause the problem (found using `print "%t";) do not cause problems when run alone in a simple script.

I have bisected the problem to a0b635c6ba559227abc4bd7cdbbfe272af4c2941.

The polynomials here are not large, a crash is obtained for example when the product

prf(1,1 + t)*prf( - 40 - 40*t - 136*t^2 - 40*t^3,9*t^2 + 18*t^3 + 9*t^4)

is produced, with output

prf( - 40 - 40*t - 136*t^2 - 40*t^3,0)

which causes Division by zero during normalization at the following sort.

There are no valgrind errors.

The DEBUG spam for this term is as follows:

*** [57.578]  CALL: gcd(0,9*a^3)                                   
*** [57.5781]  CALL : poly_ratfun_normalize                        
*** [57.5781]  CALL : poly_ratfun_read                             
*** [57.5781]  CALL: gcd(1,a+1)                                    
*** [57.5781]  CALL: integer_content(a+1)                          
*** [57.5781]  RES : integer_content(a+1) = 1                      
*** [57.5781]  CALL: integer_content(1)                            
*** [57.5781]  RES : integer_content(1) = 1                        
*** [57.5781]  CALL: integer_gcd(1,1)                              
*** [57.5781]  RES : integer_gcd(1,1) = 1                          
*** [57.5781]  CALL: gcd(1,a+1)                                    
*** [57.5781]  CALL: integer_content(a+1)                          
*** [57.5781]  RES : integer_content(a+1) = 1                      
*** [57.5781]  CALL: integer_content(1)                            
*** [57.5781]  RES : integer_content(1) = 1                        
*** [57.5781]  CALL: integer_gcd(1,1)                              
*** [57.5781]  RES : integer_gcd(1,1) = 1                          
*** [57.5781]  CALL: gcd(1,1)                                      
*** [57.5781]  CALL : poly_ratfun_read                             
*** [57.5781]  CALL: gcd(1,9*a^2+18*a^3+9*a^4)                     
*** [57.5781]  CALL: integer_content(9*a^2+18*a^3+9*a^4)           
*** [57.5781]  RES : integer_content(9*a^2+18*a^3+9*a^4) = 9       
*** [57.5781]  CALL: integer_content(1)                            
*** [57.5781]  RES : integer_content(1) = 1                        
*** [57.5781]  CALL: integer_gcd(1,9)                              
*** [57.5782]  RES : integer_gcd(1,9) = 1                          
*** [57.5782]  CALL: gcd(-40-40*a-136*a^2-40*a^3,a+1)              
*** [57.5782]  CALL: integer_content(-40-40*a-136*a^2-40*a^3)      
*** [57.5782]  RES : integer_content(-40-40*a-136*a^2-40*a^3) = -8 
*** [57.5782]  CALL: integer_content(a+1)                          
*** [57.5782]  RES : integer_content(a+1) = 1                      
*** [57.5782]  CALL: integer_gcd(-8,1)                             
*** [57.5782]  RES : integer_gcd(-8,1) = 1                         
*** [57.5782]  CALL: gcd_heuristic(5+5*a+17*a^2+5*a^3,a+1,{0})     
*** [57.5782]  CALL: gcd_heuristic(5,37,{})                        
*** [57.5782]  CALL: integer_content(37)                           
*** [57.5782]  RES : integer_content(37) = 37                      
*** [57.5782]  CALL: integer_gcd(5,37)                             
*** [57.5782]  RES : integer_gcd(5,37) = 1                         
*** [57.5782]  CALL: integer_content(1)                            
*** [57.5782]  RES : integer_content(1) = 1                        
*** [57.5782]  RES : gcd_heuristic(5+5*a+17*a^2+5*a^3,a+1,{0}) = 1 
*** [57.5782]  CALL: integer_content(1)                            
*** [57.5782]  RES : integer_content(1) = 1                        
*** [57.5782]  RES : gcd(-40-40*a-136*a^2-40*a^3,a+1) = 1          

Thanks, Josh.

jodavies commented 4 years ago

Since I have a (non-trivial) crashing example, I was able to investigate further. I cout the intermediate steps in poly_ratfun_normalize using the to_string() method of poly.

After https://github.com/vermaseren/form/blob/master/sources/polywrap.cc#L791 I can see that gcd1 and gcd2 have correctly been set to 1.

If I print the four individual factors on the following lines they are as expected: "1" is divided out correctly. However, if I print den1 after https://github.com/vermaseren/form/blob/master/sources/polywrap.cc#L794 it is zero. The later error Division by zero during normalization is of course what one now expects.

This product is ultimately a call to mul_univar in poly.cc. On entry to mul_univar the input polys a,b have the correct values. On exit c is 0. The problem here is that minpow and maxpow are computed incorrectly because in fact, one of the input polynomials is sorted in the wrong order; it is "lowfirst". Thus the following loop never runs.

The issue is further up the chain then: somewhere one of the polynomials has not been sorted, or sorted in the wrong order. I will continue investigating...

jodavies commented 4 years ago

I attach a relatively minimal example. It requires a procedure which I also include. minimal-crash.tar.gz

In the tarball are log files for two of the terms in minimal.h: Term "L3" demonstrates the division by zero problem. Term "L2" silently produces incorrect output (!!).

In both files you can see the term printed at various places in the procedure in denexpand.prc, and additionally spam from poly_ratfun_normallize, I include a patch file to duplicate this. polywrap.patch.gz

tueda commented 4 years ago

It seems that the problem comes from enabling PolyRatFun together with Collect, which perhaps causes some incompatibility of the term ordering inside the rational function (so, workaround: inserting .sort). A more minimized example:

CFunction rat,rat1;
Symbol x;
Local F = rat(1,1+x^2)*rat1(1,1+x);

AntiBracket rat;
.sort

CFunction collect;

PolyRatFun rat;
Collect collect;

Print "0: %t";
  Identify collect(x?) = x;
Print "1: %t";
  Identify rat1(?a) = rat(?a);
Print "2: %t";  * <-- Looks wrong!!
.end
FORM 4.2.1 (Aug 28 2019, v4.2.1-5-g4057c65) 64-bits  Run: Wed Nov 27 16:58:44 2019
    CFunction rat,rat1;
    Symbol x;
    Local F = rat(1,1+x^2)*rat1(1,1+x);

    AntiBracket rat;
    .sort

Time =       0.00 sec    Generated terms =          1
               F         Terms in output =          1
                         Bytes used      =        184

    CFunction collect;

    PolyRatFun rat;
    Collect collect;

    Print "0: %t";
      Identify collect(x?) = x;
    Print "1: %t";
      Identify rat1(?a) = rat(?a);
    Print "2: %t";  * <-- Looks wrong!!
    .end
0:  + rat1(1,1 + x)*collect(rat(1,1 + x^2))
1:  + rat(1,1 + x^2)*rat1(1,1 + x)
2:  + rat(1,0)

Time =       0.00 sec    Generated terms =          1
               F         Terms in output =          1
                         Bytes used      =         48
  0.00 sec out of 0.00 sec
jodavies commented 4 years ago

It does seem to work around the problem, but extra sorting does produce many calls of prf_ratfun_normalize in which a lowfirst ordered poly is multiplied by (1)/(1) which doesn't produce incorrect results, and the poly is ordered properly for future operations.

Edit: Of course the other way to work around this is to use On highfirst;...

vermaseren commented 3 years ago

The problem here is that when you declare rat to be a polyratfun, your occurrence of rat is hidden inside the function collect. Inside function arguments polyfun's and polyratfun's are not recognised as such. Hence rat is still seen as a regular function and its arguments are 'clean'. The statement id collect(x?) = x; does not touch the contents of rat and hence rat is still seen as clean. It would be quite messy to try to change that and make the program a bit slower in general. For constructs like this you can insert an extra statement id rat(?a) = rat(?a); after the id collect(x?) = x; That solves the problem by forcing Form to put a dirty flag there and redoing the rat function.

tueda commented 4 months ago

For constructs like this you can insert an extra statement id rat(?a) = rat(?a); after the id collect(x?) = x; That solves the problem by forcing Form to put a dirty flag there and redoing the rat function.

Inserting the following code before id collect(x?) = x; may be another option:

argument collect;
  argument rat;
  endargument;
endargument;

in which rat functions at the ground level are not manipulated.