jrs65 / scalapy

A python wrapper around ScaLAPACK
32 stars 12 forks source link

Problem with pdsygvx #22

Open vincentmr opened 9 years ago

vincentmr commented 9 years ago

I am trying to use the function pdsygvx to find a subset of the eigenvectors in a generalized symmetric eigenvalue problem. For small sizes, everything works fine. When trying to benchmark the code, making the problem gradually bigger, the function fails at modest size (2048x2048). That is using the high-level interface. I am told that the workspace is too small, but only once the problem has been solved twice. I thus turned to the low-level routines. For smaller systems, I can get close to the answer, but it is not exactly the same (it does not converge in the same number of steps either). Simply put, I am doing non-linear optimization so this is not impossible although I would expect to get the same answer in the same number of steps. For the failing problem, the low-level interface produces a negative workspace size, which crashes the program in the first step. I am wondering why I am observing the dissimilar behavior between high/low-level interfaces and if anyone has experienced similar problems with pdsygvx. Finally, I got around this by calling four routines which are equivalent to pdsygvx. Any insight would be appreciated. Thank you. Here are my various attempts at solving this problem.

Works

    N, N1 = H.global_shape; N = int(N)
    scale = np.zeros(1, dtype=np.int32)
    info = np.zeros(1, dtype=np.int32)
    S = rt.cholesky(S, lower=False, overwrite_a=True, zero_triangle=True)
    scale, info = ll.pdsygst( 1, 'U', N, H, S)
    evalsd, Z = rt.eigh(H, eigvals=eigvals, overwrite_a=True, lower = False)
    info = ll.pdtrtrs( 'U', 'N', 'N', N, int(Z.global_shape[1]), S, Z)

Improper workspace

    evalsd, Z = rt.eigh(H, B=S, eigvals=eigvals, overwrite_a=False, overwrite_b=True, lower = lower)

Low level interface

    N, N1 = H.global_shape; N = int(N)
    if eigvals is not None:
        N1 = np.max(eigvals)+1
    N1 = int(N1)
    evalsd = np.zeros(N1, dtype=np.float64)
    Z =  core.DistributedMatrix((N,N1),block_shape=H.block_shape,
    context=H.context)

    ORFAC = -1
    IFAIL = np.zeros(N, dtype=np.int32)
    ICLUSTR = np.zeros(2*size, dtype=np.int32)
    GAP = np.zeros(size, dtype=np.float64)

Option 1

    ll.pdsygvx(1, 'V', 'I', 'L', N,
                            H, S, 0, 0, int(eigvals[0])+1, int(eigvals[1])+1, 
                            1e6*np.finfo(float).eps,
                            evalsd, ORFAC, Z, ll.WorkArray('D', 'I'), IFAIL, ICLUSTR, GAP)

Option 2

    work = np.zeros(1, dtype=np.float64)
    lwork = np.zeros(1, dtype=np.int32)
    iwork = np.zeros(1, dtype=np.int32)
    liwork = np.zeros(1, dtype=np.int32)
    # Perform work query
    ll.pdsygvx(1, 'V', 'I', 'L', N,
                            H.local_array, 1, 1, H.desc,
                            S.local_array, 1, 1, S.desc,
                            0, 0, int(eigvals[0])+1, int(eigvals[1])+1, 
                            100*np.finfo(float).eps, evalsd, ORFAC, 
                            Z.local_array, 1, 1, Z.desc,
                            work, -1, iwork, -1, IFAIL, ICLUSTR, GAP)

    lwork = int(work[0])
    work = np.zeros(lwork, dtype=np.float64)
    liwork = int(iwork[0])
    iwork = np.zeros(liwork, dtype=np.int32)

    ll.pdsygvx(1, 'V', 'I', 'L', N,
                            H.local_array, 1, 1, H.desc,
                            S.local_array, 1, 1, S.desc,
                            0, 0, int(eigvals[0])+1, int(eigvals[1])+1, 
                            100*np.finfo(float).eps, evalsd, ORFAC, 
                            Z.local_array, 1, 1, Z.desc,
                            work, lwork, iwork, liwork, IFAIL, ICLUSTR, GAP)
jrs65 commented 9 years ago

@vincentmr Interesting problem here, thanks for sending in the bug report. I'll have a go at reproducing it. I'm really unsure why it would produce a negative workspace value, that seems really odd.