Closed jodavies closed 3 months ago
Since I have a (non-trivial) crashing example, I was able to investigate further. I cout the intermediate steps in poly_ratfun_normalize
using the to_string()
method of poly.
After https://github.com/vermaseren/form/blob/master/sources/polywrap.cc#L791 I can see that gcd1 and gcd2 have correctly been set to 1.
If I print the four individual factors on the following lines they are as expected: "1" is divided out correctly. However, if I print den1 after https://github.com/vermaseren/form/blob/master/sources/polywrap.cc#L794 it is zero. The later error Division by zero during normalization
is of course what one now expects.
This product is ultimately a call to mul_univar in poly.cc. On entry to mul_univar the input polys a,b have the correct values. On exit c is 0. The problem here is that minpow and maxpow are computed incorrectly because in fact, one of the input polynomials is sorted in the wrong order; it is "lowfirst". Thus the following loop never runs.
The issue is further up the chain then: somewhere one of the polynomials has not been sorted, or sorted in the wrong order. I will continue investigating...
I attach a relatively minimal example. It requires a procedure which I also include. minimal-crash.tar.gz
In the tarball are log files for two of the terms in minimal.h: Term "L3" demonstrates the division by zero problem. Term "L2" silently produces incorrect output (!!).
In both files you can see the term printed at various places in the procedure in denexpand.prc, and additionally spam from poly_ratfun_normallize, I include a patch file to duplicate this. polywrap.patch.gz
It seems that the problem comes from enabling PolyRatFun
together with Collect
, which perhaps causes some incompatibility of the term ordering inside the rational function (so, workaround: inserting .sort
). A more minimized example:
CFunction rat,rat1;
Symbol x;
Local F = rat(1,1+x^2)*rat1(1,1+x);
AntiBracket rat;
.sort
CFunction collect;
PolyRatFun rat;
Collect collect;
Print "0: %t";
Identify collect(x?) = x;
Print "1: %t";
Identify rat1(?a) = rat(?a);
Print "2: %t"; * <-- Looks wrong!!
.end
FORM 4.2.1 (Aug 28 2019, v4.2.1-5-g4057c65) 64-bits Run: Wed Nov 27 16:58:44 2019
CFunction rat,rat1;
Symbol x;
Local F = rat(1,1+x^2)*rat1(1,1+x);
AntiBracket rat;
.sort
Time = 0.00 sec Generated terms = 1
F Terms in output = 1
Bytes used = 184
CFunction collect;
PolyRatFun rat;
Collect collect;
Print "0: %t";
Identify collect(x?) = x;
Print "1: %t";
Identify rat1(?a) = rat(?a);
Print "2: %t"; * <-- Looks wrong!!
.end
0: + rat1(1,1 + x)*collect(rat(1,1 + x^2))
1: + rat(1,1 + x^2)*rat1(1,1 + x)
2: + rat(1,0)
Time = 0.00 sec Generated terms = 1
F Terms in output = 1
Bytes used = 48
0.00 sec out of 0.00 sec
It does seem to work around the problem, but extra sorting does produce many calls of prf_ratfun_normalize in which a lowfirst ordered poly is multiplied by (1)/(1) which doesn't produce incorrect results, and the poly is ordered properly for future operations.
Edit: Of course the other way to work around this is to use On highfirst;
...
The problem here is that when you declare rat to be a polyratfun, your occurrence of rat is hidden inside the function collect. Inside function arguments polyfun's and polyratfun's are not recognised as such. Hence rat is still seen as a regular function and its arguments are 'clean'. The statement id collect(x?) = x; does not touch the contents of rat and hence rat is still seen as clean. It would be quite messy to try to change that and make the program a bit slower in general. For constructs like this you can insert an extra statement id rat(?a) = rat(?a); after the id collect(x?) = x; That solves the problem by forcing Form to put a dirty flag there and redoing the rat function.
For constructs like this you can insert an extra statement
id rat(?a) = rat(?a);
after theid collect(x?) = x;
That solves the problem by forcing Form to put a dirty flag there and redoing the rat function.
Inserting the following code before id collect(x?) = x;
may be another option:
argument collect;
argument rat;
endargument;
endargument;
in which rat
functions at the ground level are not manipulated.
Related: #178 , possibly #270.
There are still division by zero problems when using PolyRatFun, in both form and tform (#270 seems to be a tform-only problem).
I don't have a minimal example currently, terms which cause the problem (found using `print "%t";) do not cause problems when run alone in a simple script.
I have bisected the problem to a0b635c6ba559227abc4bd7cdbbfe272af4c2941.
The polynomials here are not large, a crash is obtained for example when the product
is produced, with output
which causes
Division by zero during normalization
at the following sort.There are no valgrind errors.
The DEBUG spam for this term is as follows:
Thanks, Josh.