Problems with tform and polyratfun

jodavies commented 6 years ago

Hello,

I have not (yet?) found a particularly minimal example for this issue, but I will post what I have so far.

With the following code, I am finding that in some* runs, the final result (which should be 0) is

       + ep^4*prf(m4^7 - m4^7,m4^14 + 2*m4^7 + 1)

or

       + ep^4*prf(m4^7 - m4^7,m4^7 + m4^14 + 1 + m4^7)

The problem occurs in both tform and tvorm, 4,1, 4.2.0 and the latest git, and tvorm inside valgrind (but valgrind does not report any errors).

: problems start at -w3. For 3,5 threads there are problems ~ 2.4%,1.4% of the time. -w3 seems to produce only* the first output above. All other cases produce both. For even numbers of threads the problem is less frequent, more like 0.13% of runs.

The code is as follows. I have attached the necessary includes. They need renaming to .h.

#-
Off Statistics;

CFunction prf,den;
AutoDeclare Symbol n,m,ep;

#include- test-expanded-tiny.h
.sort
PolyRatFun prf;

#message Loading mma results...
#include- test-result-tiny.h
.sort

Drop;
Local diff7 = test7 - result7;
.sort

Denominators den;
FactArg den;
Repeat Identify den(?a,ep,?b) = 1/ep*den(?a,?b);
Repeat Identify den(n?,m?,?a) = den(n*m,?a);
Identify den(n?) = prf(1,n);

#do i = 1,5
    Identify m`i'^n? = prf(m`i'^n,1);
    Identify n`i'^m? = prf(n`i'^m,1);
#enddo
.sort

Print +s diff7;

.end

test-expanded-tiny.txt test-result-tiny.txt

I will attempt to find a more trivial example later...

Thanks, Josh.

tueda commented 6 years ago

An experiment (though I don't know the meaning of this probablity distribution and also fluctuation uncertainty):

-w3  count=10000 good=9543 bad=457
-w4  count=10000 good=9869 bad=131
-w5  count=10000 good=9750 bad=250
-w6  count=10000 good=9578 bad=422
-w7  count=10000 good=9761 bad=239
-w8  count=10000 good=9815 bad=185
-w9  count=10000 good=9876 bad=124
-w10 count=10000 good=9885 bad=115
-w11 count=10000 good=9920 bad=80
-w12 count=10000 good=9932 bad=68
-w13 count=10000 good=9968 bad=32
-w14 count=10000 good=9957 bad=43
-w15 count=10000 good=9965 bad=35
-w16 count=10000 good=9989 bad=11

tueda commented 6 years ago

A slightly reduced version (test.frm):

#-
Off stats;

CF prf;
S ep,n1,m4,m1,m2,m3,n2,n3,n4;

#include test.txt
*repeat id prf(m1?,m2?)*prf(n1?,n2?) = prf(m1*n1,m2*n2);  * (1) doesn't change
ModuleOption noparallel;
.sort

PolyRatFun prf;
*ModuleOption noparallel;  * (2) workaround
.sort

P;
ModuleOption noparallel;
.sort

#if termsin(diff7)
  #terminate
#endif

ModuleOption noparallel;
.end

with test.txt test.sh

Usage: ./test.sh -w3

(NOTE: because FORM doesn't handle Ctrl+C in the normal way, the above shell script is hard to stop by Ctrl+C. You need to use Ctrl+Z and then kill the job.)

In this version, the input contains just terms with ordinary symbols and functions. The problem occurs at the use of PolyRatFun. So a workaround for now would be unfortunately Don't use PolyRatFun with three or more threads.

Somehow #:threadbucketsize 1 decreases the error rate, but doesn't remove the problem completely.

jodavies commented 6 years ago

In fact, tform 4.1 at the v4.1 tag seems not to have any problems. I was using a network-wide 4.1 binary which has the tag "my_buggy-244-g6b38064" -- I cannot find this hash in the commit history? (Edit: dated 20161103)

I will try to bisect.

EDIT: I hit some commits that won't compile, and I need to finish some other stuff. I found, from good v4.1 and bad v4.2.0,

f5a5e82 : div by zero during normalization
e55a108 : div by zero during normalization

if you take those as good, one can find

258b491 : returns ep^4 * prf(0,1), but only sometimes.

Presumably the issue started around there somewhere...

EDIT2: I attached the log if you want to replay. bisect.log

tueda commented 6 years ago

Well, I would say a795a9c315dc75d1f5ca0f380becba8e2259d774 is the first bad commit in the sense that other bad commits (which fail the test, so both f5a5e82eec1798c15e1d5b7c3ef2787b9719125e and e55a108b2388b4d328dfcc384da700c8e044bd85 are bad) are descendants.

Around the bad commit:

BAD  * e55a108 [2015-06-29] Improved memory usage of polygcd bracketing: brackets are now only generated when required. @Ben Ruijl 
BAD  *   56714e0 [2015-06-26] Merge branch 'master' of github.com:vermaseren/form @vermaseren 
     |\  
BAD  | *   f5a5e82 [2015-06-26] Merge branch 'master' of github.com:vermaseren/form @Ben Ruijl 
     | |\  
GOOD | * | 03debb8 [2015-06-26] - Fixed the bounds check for the gcd: now check each variable individually - The shape is now reset when we have found a gcd that is too large - Only if extra variables can be bracketed out, the gcd is called recursively @Ben Ruijl 
BAD  * | | 4f6cb55 [2015-06-26] 4 bugfixes (already!) @vermaseren 
     | |/  
     |/|   
BAD  * | 1a1db23 [2015-06-26] Added extra polyratfun options to manual @vermaseren 
BAD  * | a795a9c [2015-06-26] Added PolyRatFun divergence and expand. +bugfixes @vermaseren 
GOOD * | 495aed3 [2015-06-24] - Reverted the end condition of the dense gcd routine. Seemingly the   final condition in the Algorithms for Computer Algebra book is not   good enough. @Ben Ruijl 
     |/  
GOOD * f766b04 [2015-06-23] - Remove content in last variable before doing sparse gcd calculation. This improves   performance and fixes a bug where a smaller divider is returned as the gcd - Refined the criteria for identifying bad gcd suggestions @Ben Ruijl

bisect.log:

# bad: [7599180782adfa9ca63b9d9cf1de0fd4ff590f21] Put in the fix and undid another
# good: [480ce79429a0c92a7da6c694dfe82d0b4afe48d0] Preparing for release 4.1
git bisect start 'HEAD' 'v4.1-20131025'
# bad: [24f17729ab5e4e9450ec650b57bf3feeca6edafd] Quick fix for crash by replace_(x,0)
git bisect bad 24f17729ab5e4e9450ec650b57bf3feeca6edafd
# bad: [cf7e7ec0c78a16dd2833c57f67b1496f9f6ed96b] [doc] Primitive support for LaTeX2HTML
git bisect bad cf7e7ec0c78a16dd2833c57f67b1496f9f6ed96b
# good: [928e8c9c221aecf1a04a8dde6f8fb2935d6e0710] [parform] Fix a bug in disabling parallelization due to $-vars
git bisect good 928e8c9c221aecf1a04a8dde6f8fb2935d6e0710
# good: [495aed3c300a3912ea94a87c3e2542f4a5ae856f] - Reverted the end condition of the dense gcd routine. Seemingly the   final cond
git bisect good 495aed3c300a3912ea94a87c3e2542f4a5ae856f
# bad: [d40483a3a3ea215a8b2b9eec0287f57a2540aadd] - Zippel's algorithm only works for monic gcds. Multiplying by the lcoeff should f
git bisect bad d40483a3a3ea215a8b2b9eec0287f57a2540aadd
# bad: [56714e0ca76b019bd1f2ad6d17c8b07569382846] Merge branch 'master' of github.com:vermaseren/form
git bisect bad 56714e0ca76b019bd1f2ad6d17c8b07569382846
# bad: [4f6cb5513c4e508f87e1e134b711f28f839e9c64] 4 bugfixes (already!)
git bisect bad 4f6cb5513c4e508f87e1e134b711f28f839e9c64
# bad: [1a1db23f6ba869081d307f3a522add5961bc99b4] Added extra polyratfun options to manual
git bisect bad 1a1db23f6ba869081d307f3a522add5961bc99b4
# bad: [a795a9c315dc75d1f5ca0f380becba8e2259d774] Added PolyRatFun divergence and expand. +bugfixes
git bisect bad a795a9c315dc75d1f5ca0f380becba8e2259d774
# first bad commit: [a795a9c315dc75d1f5ca0f380becba8e2259d774] Added PolyRatFun divergence and expand. +bugfixes

vermaseren / form

Problems with tform and polyratfun #270