serge-sans-paille / pythran

Ahead of Time compiler for numeric kernels
https://pythran.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.01k stars 193 forks source link

g++ error (involving sum I believe) #833

Open nbecker opened 6 years ago

nbecker commented 6 years ago

The following code results in a g++ error (and the longest error message in history!) https://gist.github.com/nbecker/37393e16d5ae8044e37e80b040c4e548

serge-sans-paille commented 6 years ago

14147 lines of error :-) mmmmh, tasty!

nbecker commented 6 years ago

Did I set the record? Is there a prize??

serge-sans-paille commented 6 years ago

:-) That's not a metric I'm keeping track of :-)

there's two constructs pythran does not support (yet) in tour code:

Thanks for providing such a delicate test case, food for thoughts :-)

serge-sans-paille commented 6 years ago

The following compiles correctly (but is the result correct?) with an upt-o-date master

import numpy as np

#pythran export BinaryProduct(int[:,:], int[:])
def BinaryProduct(X,Y):

    """ Binary Matrices or Matrix-vector product in Z/2Z. Works with scipy.sparse.csr_matrix matrices X,Y too."""

    A = X.dot(Y)

    return A%2

#pythran export InCode(int[:,:], int[:])
def InCode(H,x):

    """ Computes Binary Product of H and x. If product is null, x is in the code.
        Returns appartenance boolean. 
    """

    return  (BinaryProduct(H,x)==0).all()

#pythran export Decoding_logBP(int[:,:], int list list, int list list, float[:,:], float[:], int)
def Decoding_logBP(H,Bits,Nodes,Lq,Lc,max_iter):
    """ Decoding function using Belief Propagation algorithm (logarithmic version)
    IMPORTANT: if H is large (n>1000), H should be scipy.sparse.csr_matrix object to speed up calculations
    (highly recommanded. )
    -----------------------------------
    Parameters:

    H: 2D-array (OR scipy.sparse.csr_matrix object) Parity check matrix, shape = (m,n) 
    y: n-vector recieved after transmission in the channel. (In general, returned 
    by Coding Function)
    Signal-Noise Ratio: SNR = 10log(1/variance) in decibels of the AWGN used in coding.

    max_iter: (default = 1) max iterations of the main loop. Increase if decoding is not error-free.
     """

    m,n=H.shape

    if not len(Lc)==n:
        raise ValueError('La taille de y doit correspondre au nombre de colonnes de H')

    if m>=n:
        raise ValueError('H doit avoir plus de colonnes que de lignes')

    # var = 10**(-SNR/10)

    # ### ETAPE 0: initialisation 

    # Lc = 2*y/var

    # Lq=np.zeros(shape=(m,n))

    Lr = np.zeros(shape=(m,n))

    count=0

    prod=np.prod
    tanh = np.tanh
    log = np.log

    Lq += Lc

    #Bits,Nodes = BitsAndNodes(H)
    while(True):

        count+=1 #Compteur qui empêche la boucle d'être infinie .. 

        #### ETAPE 1 : Horizontale
        for i in range(m):
            Ni = Bits[i]
            for j in Ni:
                Nij = list(Ni)
                Nij.remove(j)
                X = prod(tanh(0.5*np.array(Lq[i])[Nij]))
                num = 1 + X
                denom = 1 - X
                # if num == 0: 
                #     Lr[i,j] = -1
                # elif denom  == 0:
                #     Lr[i,j] =  1
                # else: 
                #     Lr[i,j] = log(num/denom)
                Lr[i,j] = log(num) - log(denom)
        Lr = np.clip (Lr, -100, 100)

        #### ETAPE 2 : Verticale
        for j in range(n):
            Mj = Nodes[j]

            for i in Mj:
                Mji = list(Mj)
                Mji.remove(i)

                Lq[i,j] = Lc[j]+sum(np.array(Lr[Mji])[:,j])

        #### LLR a posteriori:
        # L_posteriori = np.zeros(n)
        # for j in range(n):
        #     Mj = Nodes[j]

        #     L_posteriori[j] = Lc[j] + sum(Lr[Mj,j])
        extrinsic = np.empty(n)
        for j in range(n):
            Mj = Nodes[j]

            extrinsic[j] = sum(np.array(Lr[Mj])[:,j])

        L_posteriori = extrinsic + Lc
        #x = np.array(L_posteriori <= 0).astype(int)
        x = np.array(extrinsic <= 0).astype(int)
        product = InCode(H,x)
        #print (count, product)

        if product or count >= max_iter:
            break
    # print (count)
    return np.array(L_posteriori <= 0).astype(int), Lq - Lc, extrinsic, product
nbecker commented 6 years ago

Thanks. Compiles using current released version, but output is not correct :( Also compiles with replacing e.g., Lr[Mj,j] -> Lr[Mj][j]. Shouldn't that work?

On Wed, Mar 28, 2018 at 3:55 PM serge-sans-paille notifications@github.com wrote:

The following compiles correctly (but is the result correct?) with an upt-o-date master

import numpy as np

pythran export BinaryProduct(int[:,:], int[:])

def BinaryProduct(X,Y):

""" Binary Matrices or Matrix-vector product in Z/2Z. Works with scipy.sparse.csr_matrix matrices X,Y too."""

A = X.dot(Y)

return A%2

pythran export InCode(int[:,:], int[:])

def InCode(H,x):

""" Computes Binary Product of H and x. If product is null, x is in the code.
    Returns appartenance boolean.
"""

return  (BinaryProduct(H,x)==0).all()

pythran export Decoding_logBP(int[:,:], int list list, int list list, float[:,:], float[:], int)

def Decoding_logBP(H,Bits,Nodes,Lq,Lc,max_iter): """ Decoding function using Belief Propagation algorithm (logarithmic version) IMPORTANT: if H is large (n>1000), H should be scipy.sparse.csr_matrix object to speed up calculations (highly recommanded. )

Parameters:

H: 2D-array (OR scipy.sparse.csr_matrix object) Parity check matrix, shape = (m,n)
y: n-vector recieved after transmission in the channel. (In general, returned
by Coding Function)
Signal-Noise Ratio: SNR = 10log(1/variance) in decibels of the AWGN used in coding.

max_iter: (default = 1) max iterations of the main loop. Increase if decoding is not error-free.
 """

m,n=H.shape

if not len(Lc)==n:
    raise ValueError('La taille de y doit correspondre au nombre de colonnes de H')

if m>=n:
    raise ValueError('H doit avoir plus de colonnes que de lignes')

# var = 10**(-SNR/10)

# ### ETAPE 0: initialisation

# Lc = 2*y/var

# Lq=np.zeros(shape=(m,n))

Lr = np.zeros(shape=(m,n))

count=0

prod=np.prod
tanh = np.tanh
log = np.log

Lq += Lc

#Bits,Nodes = BitsAndNodes(H)
while(True):

    count+=1 #Compteur qui empêche la boucle d'être infinie ..

    #### ETAPE 1 : Horizontale
    for i in range(m):
        Ni = Bits[i]
        for j in Ni:
            Nij = list(Ni)
            Nij.remove(j)
            X = prod(tanh(0.5*np.array(Lq[i])[Nij]))
            num = 1 + X
            denom = 1 - X
            # if num == 0:
            #     Lr[i,j] = -1
            # elif denom  == 0:
            #     Lr[i,j] =  1
            # else:
            #     Lr[i,j] = log(num/denom)
            Lr[i,j] = log(num) - log(denom)
    Lr = np.clip (Lr, -100, 100)

    #### ETAPE 2 : Verticale
    for j in range(n):
        Mj = Nodes[j]

        for i in Mj:
            Mji = list(Mj)
            Mji.remove(i)

            Lq[i,j] = Lc[j]+sum(np.array(Lr[Mji])[:,j])

    #### LLR a posteriori:
    # L_posteriori = np.zeros(n)
    # for j in range(n):
    #     Mj = Nodes[j]

    #     L_posteriori[j] = Lc[j] + sum(Lr[Mj,j])
    extrinsic = np.empty(n)
    for j in range(n):
        Mj = Nodes[j]

        extrinsic[j] = sum(np.array(Lr[Mj])[:,j])

    L_posteriori = extrinsic + Lc
    #x = np.array(L_posteriori <= 0).astype(int)
    x = np.array(extrinsic <= 0).astype(int)
    product = InCode(H,x)
    #print (count, product)

    if product or count >= max_iter:
        break
# print (count)
return np.array(L_posteriori <= 0).astype(int), Lq - Lc, extrinsic, product

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/serge-sans-paille/pythran/issues/833#issuecomment-377015328, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHK0A-9tTtXMA4JqvY4PwsTL118WqeCks5ti-rHgaJpZM4S-32J .

nbecker commented 6 years ago

Good news, I've got it working. The time spent in my test program in the decoder function dropped from 95% in the best python version to 89% in the pythran version. The working code is here https://gist.github.com/nbecker/a1085d485a1e63d6126820a255302d3e

Along the way, I found that if I mistakenly wrote e.g., extrinsic[j] = sum(np.array(Lr[Mj][j])) instead of: extrinsic[j] = sum(np.array(Lr[Mj][:,j])) that the code segfaults at runtime. I don't suppose pythran could catch that?

nbecker commented 6 years ago

I made a little faster by removing common subexpressions for the arrays: https://gist.github.com/nbecker/472edc3bdf1365a7e41db1ec1b50e2ce

I don't know why I need to use e.g., Lrj = np.array(Lr[:,j]) ... Lq[i,j] = Lc[j]+sum(np.array(Lrj[Mji])) Don't know why np.array(...) is needed here in either case, but doesn't compile without

serge-sans-paille commented 6 years ago

@nbecker I improved your code while improving pythran too, check the HEAD of fix/ldpc-decoder branch and the source code in pythran/tests/cases/ldpc-decoder.py and tell me about the speedups :-)

nbecker commented 6 years ago

looking at the test case, I guess we still have to write G(Lq[i][Nij] instead of G(Lq[i,Nij]) correct?

serge-sans-paille commented 6 years ago

correct!

nbecker commented 6 years ago

Here's the latest version https://gist.github.com/nbecker/c00202dc72a343a27923a1ad408c9e90 In 2 places, I factored out common subexpressions from the inner for loops.

The code compiles (yay!) with latest pythran from git (yay!) and speed has improved.

serge-sans-paille commented 6 years ago

Can you provide me with relevant arguments so that I can run benchmarks on my own? I tried that

python -m timeit -s 'import nbb, numpy as np; x = np.arange(200).reshape((10,20)); xp = np.arange(200.).reshape((10,20)); y = [[i/2]*20 for  i in range(20)]; z = np.arange(20.)' 'nbb.Decoding_logBP(x, y, y, xp, z, 2)'
nbecker commented 6 years ago

Hopefully, this has everything needed to try running ldpc_nr.py, which is what I used for benchmarking

ldpc.zip

It should print True True 0 10 times and exit.

No, on review I see more dependencies. Let me try to clean that up.

nbecker commented 6 years ago

OK, try this one. You should only also need to install pyldpc, but that is available via pip (pypi) ldpc2.zip

nbecker commented 6 years ago

I see the function phi0 translates to return (-pythonic::math::functor::log{}(pythonic::math::functor::tanh{}((pythonic::operator_::div(x__, 2L)))));

The use of '2L' seems suspicious here, this is a floating divide, not integer. Does it matter?

serge-sans-paille commented 6 years ago

The use of '2L' seems suspicious here, this is a floating divide, not integer. Does it matter?

No it does not :-)

serge-sans-paille commented 6 years ago

Can you confirm #838 fixes these issues?

nbecker commented 6 years ago

The version I'm testing has tags default/fix/ldpc-decoder and includes "Various view improvements". I'm afraid I'm using mercurial not git (via hg-git plugin), so it's a little hard to compare versions but I think this is the version you're asking about. Still haven't become familiar with using git.

This one is working with the test code I sent. However if I comment out the Nij = list(Ni) and change it to Nij = Ni, and similarly for Mji = Mj, the the python process segfaults. I was checking if the list() was needed, although now thinking about it, it is needed in this version, but it shouldn't segfault.