sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.31k stars 449 forks source link

Random freezes with @parallel decorator #14154

Open miguelmarco opened 11 years ago

miguelmarco commented 11 years ago

I get random freezes of a parallel process with the following code:

def trenza(f,points,exact=True,step=0.1,precision=53):
    if len(points)>2:
        return trenza(f,points[:2],exact,step,precision),trenza(f,points[1:],exact,step,precision)
    F=ComplexField(precision)
    x0=F(points[0])
    x1=F(points[1])
    d=abs(F(x0)-F(x1))
    (x,y)=f.parent().gens()
    y0s=f(x0,QQbar[y].gen()).roots(multiplicities=False)
    dfx=f.derivative(x)
    dfy=f.derivative(y)
    RX=PolynomialRing(F,'x')
    RY=PolynomialRing(F,'y')
    R=PolynomialRing(F,'x,y')
    Rext=PolynomialRing(F,'X0,Y0,x,y,D')
    diffs=filter(lambda a:a!=0,[f.derivative(y,k) for k in range(f.degree()+1)])
    Ak=[Rext(g(Rext('x'),Rext('Y0')+Rext('D')*(x-Rext('X0')))) for g in diffs]
    args=[(f,x0,x1,y0,d,Ak,R,F,RX,x.change_ring(F),y.change_ring(F),RY,dfx,dfy,exact,step) for y0 in y0s]
    l=list(siguehilo(args))

@parallel
def siguehilo(f,x0,x1,y0a,d,Ak,R,F,RX,x,y,RY,dfx,dfy,exact,stepx):
    t=F(0)
    y0=F(y0a)
    xi=x0
    puntos=[]
    sigue=True
    uno=False
    pr=F(2)^-(F.precision()-2)
    while t<F(1) or sigue:
        g=f(xi,y).polynomial(y)
        y2=RY(g).newton_raphson(8,y0)
        #while abs(y1-y2)>pr*16:
        #    [y1,y2]=RY(g).newton_raphson(2,y2)
        y0=y2[-1]
        puntos.append([t,y0])
        d0=F(-dfx(xi,y0)/dfy(xi,y0))
        h=1
        if exact:
            pr=2^-(F.precision()-1)
            FR=RealIntervalField(F.precision())
            FC=ComplexIntervalField(F.precision())
            R=PolynomialRing(FC,'x,y')
            RX=PolynomialRing(FC,'x')
            xx0=FC(FR(xi.real()-pr,xi.real()+pr)+FC(I)*FR(xi.imag()-pr,xi.imag()+pr))
            yy0=FC(FR(y0.real()-pr,y0.real()+pr)+FC(I)*FR(y0.imag()-pr,y0.imag()+pr))
            dd=FC(d0)
            Aka=[j.change_ring(FC) for j in Ak]
            akt=[(j(xx0,yy0,x.change_ring(FC)+xx0,0,dd)) for j in Aka]
            akt=[sum([a[0].abs()*RX(a[1]) for a in R(hh)]) for hh in akt]
            a1t=-akt[1]+2*akt[1].coeffs()[0]
            akt[1]=a1t
            L=filter(lambda a: a!=0,akt)
            chequea=False
            h=1
            while not chequea:
                chequea=True
                k=2
                while chequea and k<len(L):
                    L1=(L[k](h)*L[0](h)^(k-1))
                    L2=(QQ(0.157670780786)^k*factorial(k)*L[1](h))
                    if not L2>=L1:
                        chequea=False
                    k+=1
                h=h/2
        else:
            h=F(stepx)
        t+=h/d
        if uno:
            sigue=False
        if t>F(1):
            t=F(1)
            uno=True
        xj=x0*(1-t)+x1*t
        y0+=d0*(xj-xi)
        xi=xj
    return puntos

It is a code i am writing for computing braid monodromy of curves.

If i run it with, for example, this input:

R.<x,y>=QQ[]
f=-y^3+x^2
time trenza(f,[1,I,-1,-I,1],exact=False,step=0.5)

I usually get the answer in a few seconds, but if i repeat several times the same computation, at some point it freezes, as if it was computing for a long time.

When i interrupt the computation i get the following traceback:

^C
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_sage_input_33.py", line 10, in <module>
    exec compile(u'open("___code___.py","w").write("# -*- coding: utf-8 -*-\\n" + _support_.preparse_worksheet_cell(base64.b64decode("dGltZSB0cmVuemEoZixbMSxJLC0xLC1JLDFdLGV4YWN0PUZhbHNlLHN0ZXA9MC41KQ=="),globals())+"\\n"); execfile(os.path.abspath("___code___.py"))
  File "", line 1, in <module>

  File "/tmp/tmpFpew9q/___code___.py", line 3, in <module>
    exec compile(u'__time__=misc.cputime(); __wall__=misc.walltime(); trenza(f,[_sage_const_1 ,I,-_sage_const_1 ,-I,_sage_const_1 ],exact=False,step=_sage_const_0p5 ); print "Time: CPU %.2f s, Wall: %.2f s"%(misc.cputime(__time__), misc.walltime(__wall__))
  File "", line 1, in <module>

  File "/tmp/tmpOfjSpn/___code___.py", line 5, in trenza
    return trenza(f,points[:_sage_const_2 ],exact,step,precision)*trenza(f,points[_sage_const_1 :],exact,step,precision)
  File "/tmp/tmpOfjSpn/___code___.py", line 21, in trenza
    l=list(siguehilo(args))
  File "/home/mmarco/sage-5.7.beta3/local/lib/python2.7/site-packages/sage/parallel/use_fork.py", line 189, in __call__
    os.wait()
  File "c_lib.pyx", line 68, in sage.ext.c_lib.sage_python_check_interrupt (sage/ext/c_lib.c:736)
KeyboardInterrupt
__SAGE__

I really don't know how to catch the bug, and how to debug it.

Depends on #14150

Component: memleak

Keywords: parallel

Issue created by migration from https://trac.sagemath.org/ticket/14154

jdemeyer commented 11 years ago
comment:1

It would be good to provide more minimal code exhibiting the problem.

I'm setting the dependency simply because because any patch here might conflict with #14150.

jdemeyer commented 11 years ago

Dependencies: #14150

jdemeyer commented 11 years ago
comment:2

Also, it seems you are running in the notebook. Does running it in the command-line make a difference?

miguelmarco commented 11 years ago

Description changed:

--- 
+++ 
@@ -3,7 +3,7 @@

def trenza(f,points,exact=True,step=0.1,precision=53): if len(points)>2:

miguelmarco commented 11 years ago
comment:4

I have trimmed down a little bit the code, but it is still very big.

I have tested on two different systems and the probabilities of hitting the problem seem to vary a lot. I have also experienced the same problem on the command line.

To trigger it i have to try several posibilities for the "exact" and "step" parameters. A combination that seems to work more often is this:

[trenza(f,[1,I,-1,-I,1],exact=True,step=0.5) for i in range(5)]

Each separated instance of

trenza(f,[1,I,-1,-I,1],exact=True,step=0.5) 

takes around 3 seconds in my computer. But the list of five iterations doesn't give any answer even after several minutes.

jdemeyer commented 11 years ago
comment:5

If would be really good if you could simplify the code to better find out where it goes wrong.

vbraun commented 11 years ago
comment:6

Whats the expected output?

sage: [trenza(f,[1,I,-1,-I,1],exact=True,step=0.5) for i in range(5)]
[(None, (None, (None, None))),
 (None, (None, (None, None))),
 (None, (None, (None, None))),
 (None, (None, (None, None))),
 (None, (None, (None, None)))]
sage: trenza(f,[1,I,-1,-I,1],exact=True,step=0.5) 
(None, (None, (None, None)))

Loop or no loop makes no difference here on Fedora 18 x86_64. Which OS are you on?

miguelmarco commented 11 years ago
comment:7

The expected is basically no output (the trenza function returns nothing, it is just to trigger the problem). As i said, the problem appears somehow randomly. Did you try to run it several times?

And as Murphy's law dictates, now the problem doesn't show in my system either ;)

I checked in both a Mageia server and my gentoo box. Both x86_64, with sage-5.7 compiled from source.