Array of polynomials - Githubissues

varun19299 commented 3 years ago

Thank you for creating this library!

Is it possible to create an array of polynomials? Right now I can only do this by looping through each set of coefficients.

This could be useful for BCH and Reed Solomon codes, when we want to encode an array of messages in parallel.

mhostetter commented 3 years ago

Hey, thanks for using galois!

Could you provide an example of your current solution/workaround? And an example of how you'd like to do it?

I think there are a couple ways to do this, but seeing your use case would help.

varun19299 commented 3 years ago

Thank you for your reply!

I would like to batch encode and batch decode messages via BCH.

The field used, say GF(2**8), is fixed. Additionally the generator polynomial (g(x)), message length, and code length are fixed. Some more details:

Array of messages: M_ll, shape (N, k).
Array of corresponding code words: C_ll, shape (N, n).
Array of received code words (possibly corrupted): R_ll, shape (N,n).
Array of decoded messages: D_ll, shape (N,k).

To get C_ll, each message in M_ll is first converted to polynomial notation (so a polynomial in GF(2**8)) and then multiplied by the generator polynomial. Systematic encoding can also be performed similarly. To decode, we first calculate syndromes and then apply the Peterson procedure or Berklamp Massey. This is identical to what Wikipedia describes for BCH encoding & decoding.

I've so far implemented the encoding portion by looping over each message in N iterations. However, the entire process can be carried out in parallel. Specifically, if we had support for Polynomial arrays, encoding is simply scalar multiplication (of the generator polynomial with the Message Polynomial array).

I think the decoding portion is vectorizable too.

varun19299 commented 3 years ago

Update: I don't think decoding is vectorizable, but for small message spaces, simply choosing the minimum Hamming weight code is certainly vectorizable.

mhostetter commented 3 years ago

The two approaches I can think of are: 1) Construct the matrix of messages and then compute each codeword by convolving m and g (equivalent to polynomial multiplication), or 2) Construct a dtype=object ndarray of galois.Poly message and generator objects and then multiply the two arrays. This is not faster, in fact probably slower, but can be accomplished in "one line".

Are you looking for a performance speed-up or more compact code?

Option 1:

In [1]: import numpy as np                                                                                                  

In [2]: import galois                                                                                                       

In [3]: GF = galois.GF(2**4)                                                                                                

In [4]: n = 15                                                                                                              

In [5]: g = galois.Poly.Degrees([4,1,0]); g                                                                                 
Out[5]: Poly(x^4 + x + 1, GF(2))

In [6]: k = n - g.degree                                                                                                    

In [7]: n, k                                                                                                                
Out[7]: (15, 11)

In [8]: N = 3                                                                                                               

# Message array
In [9]: m = galois.GF2.Random((N, k)); m                                                                                    
Out[9]: 
GF([[1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0],
    [1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0],
    [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1]], order=2)

# Codeword array
In [10]: c = galois.GF2.Zeros((N, n))                                                                                    

In [12]: for i in range(N): 
    ...:     c[i,:] = np.convolve(m[i,:], g.coeffs) 
    ...:                                                                                                                    

In [13]: c                                                                                                                  
Out[13]: 
GF([[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0],
    [1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0],
    [1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1]], order=2)

Option 2:

In [15]: m_poly = np.zeros(N, dtype=np.object_)                                                                             

In [16]: for i in range(N): 
    ...:     m_poly[i] = galois.Poly(m[i,:]) 
    ...:                                                                                                                    

In [17]: m_poly                                                                                                             
Out[17]: 
array([Poly(x^10 + x^7 + x^4 + x^3 + x, GF(2)),
       Poly(x^10 + x^8 + x^5 + x^3 + x^2 + x, GF(2)),
       Poly(x^10 + x^8 + x^7 + 1, GF(2))], dtype=object)

In [19]: g_poly = np.array([g], dtype=np.object_); g_poly                                                                   
Out[19]: array([Poly(x^4 + x + 1, GF(2))], dtype=object)

# One line that uses numpy's broadcasting to multiply each polynomial object by the generator polynomial
In [20]: c_poly = m_poly * g_poly; c_poly                                                                                   
Out[20]: 
array([Poly(x^14 + x^10 + x^3 + x^2 + x, GF(2)),
       Poly(x^14 + x^12 + x^11 + x^10 + x^8 + x^7 + x^4 + x, GF(2)),
       Poly(x^14 + x^12 + x^10 + x^9 + x^7 + x^4 + x + 1, GF(2))],
      dtype=object)

# You can visually confirm that c_poly == c
In [21]: c                                                                                                                  
Out[21]: 
GF([[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0],
    [1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0],
    [1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1]], order=2)

varun19299 commented 3 years ago

Hi Matt, I'm mainly looking for a performance speed up. I think option 2 does this. Do you have any thoughts on decoding?

mhostetter commented 3 years ago

No thoughts on decoding yet. I do plan to release a BCH and Reed Solomon code in a future update. Maybe we'll find common areas to improve upon.

To truly speed up the matrix encoding (polynomial multiplication), we'd want to convolve M = (N, k) by g = (n - k + 1) row-wise. I override the np.convolve function to perform convolution in GF(p^m) not in Z, but the native np.convolve function only supports 1D array inputs. I'm hesitant to change numpy function behavior to support 2D and 1D arrays. My guiding principle has been to not modify the numpy API, just replace the arithmetic with GF(p^m).

Here is where I override np.convolve in the code. https://github.com/mhostetter/galois/blob/8f060e85d1885d18ae81ab503720d89d8c6472fc/galois/field/meta_func.py#L118-L145

cls._funcs["convolve"] refers to a JIT-compiled implementation of convolution using the finite field.

https://github.com/mhostetter/galois/blob/8f060e85d1885d18ae81ab503720d89d8c6472fc/galois/field/meta_func.py#L250-L257

This could easily be extended to work across an extra dimension (2D and 1D inputs). This would be the fastest implementation. I'm not sure how or where to incorporate something like that into the package, however.

Maybe I could add a set of functions: galois.poly_mul(), galois.poly_div(), galois.poly_divmod() that implement these convolution and long-division operations on arrays of coefficients (not galois.Poly objects, since those are already implemented.) In this approach, you could just type the following. And this would surely be the fastest implementation since it's JIT-compiled.

In [9]: m = galois.GF2.Random((N, k)); m                                                        
Out[9]: 
GF([[1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1],
    [0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1],
    [1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1]], order=2)

In [21]: galois.poly_mul(m, g.coeffs)
Out[21]: 
GF([[1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1],
    [0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1],
    [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]], order=2)

varun19299 commented 3 years ago

To truly speed up the matrix encoding (polynomial multiplication), we'd want to convolve M = (N, k) by g = (n - k + 1) row-wise. I override the np.convolve function to perform convolution in GF(p^m) not in Z, but the native np.convolve function only supports 1D array inputs. I'm hesitant to change numpy function behavior to support 2D and 1D arrays. My guiding principle has been to not modify the numpy API, just replace the arithmetic with GF(p^m).

How about adding a separate batch convolution function? I think this can be done via np.einsum. Is np.einsum compatible with the array class?

Maybe I could add a set of functions: galois.poly_mul(), galois.poly_div(), galois.poly_divmod() that implement these convolution and long-division operations on arrays of coefficients (not galois.Poly objects, since those are already implemented.) In this approach, you could just type the following. And this would surely be the fastest implementation since it's JIT-compiled.

This sounds great. I haven't had the chance to dig deeper and see how the galois package extends numpy. While I don't think I have the relevant expertise, can I reach out to you incase I could pitch-in certain places?

For now, I'll use the option 2 you described above to encode. For decoding, I'm trying to see if the Berlekamp-Massey can be vectorized. MATLAB seems to support batch encoding and decoding, but this could very well just be a loop.

mhostetter commented 3 years ago

I'm not very familiar with np.einsum. It is not currently supported on Galois field arrays, although it could be in the future. From a cursory glance, I'm not sure it would support a 1D convolution across an axis of an input array.

Almost certainly Matlab is just looping! :laughing: I would be very disappointed and hold my head in shame if Matlab was more performant on Galois field array processing than galois. But please bring it to my attention if you find that it is! :facepalm:

Yes, feel free to reach out anytime. I'm more than open to bug finds, performance issues, feature requests, general feedback, etc.

I'm going to leave this issue open to track the discussion and potential addition of:

[ ] galois.poly_mul()
[ ] galois.poly_div()
[ ] galois.poly_divmod()
[ ] galois.poly_mod()

mhostetter commented 3 years ago

@varun19299 I may have another solution for you, np.apply_along_axis(). It is supposed to be faster than looping in python, see here.

In [1]: import numpy as np                                                                      

In [2]: import galois                                                                           

In [3]: GF = galois.GF(2**4)                                                                    

In [4]: n = 15                                                                                  

In [5]: g = galois.Poly.Degrees([4,1,0]); g                                                     
Out[5]: Poly(x^4 + x + 1, GF(2))

In [7]: k = n - g.degree                                                                        

In [8]: n, k                                                                                    
Out[8]: (15, 11)

In [9]: N = 3                                                                                   

In [10]: m = galois.GF2.Random((N, k)); m                                                       
Out[10]: 
GF([[0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1],
    [1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1]], order=2)

In [12]: np.apply_along_axis(np.convolve, 1, m, g.coeffs)                                       
Out[12]: 
GF([[0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1],
    [1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1]], order=2)

varun19299 commented 3 years ago

Almost certainly Matlab is just looping! 😆 I would be very disappointed and hold my head in shame if Matlab was more performant on Galois field array processing than galois. But please bring it to my attention if you find that it is! 🤦

Yep, MATLAB seems to be looping (linear time increase across different N).

For now, I'll probably work with loops for decoding. But I'm looking out for vectorisation, since that would enable real time decoding for my application. (I expect to encode and decode ~10^4-5 codewords at a time).

varun19299 commented 3 years ago

It is supposed to be faster than looping in python, see here.

Is this true? Or is it equivalent to a loop?

In [12]: for i in range(N): 
    ...:     c[i,:] = np.convolve(m[i,:], g.coeffs) 
    ...:

Not sure if this is a good idea: how does simply JIT compiling this loop with Numba compare? I know that galios utilises Numba in the backend, but would Numba compile this successfully (ie., offer speedup)?

mhostetter commented 3 years ago

Yes, this is type of code that greatly benefits from numba. The general rule of thumb: python for loops are slow, wrap those routines in numba.

The code you have won't work as expected because when numba compiles it, it will use the normal np.convolve not my overridden version for finite fields. It's ok though because it's easy enough to re-implement that for GF(2) because 2 is prime.

Here's how I would make encoding as fast as possible. Here's an example for the BCH(31, 21) code.

import numba
import numpy as np

import galois

n = 31
k = 21
N = 100_000

GF2 = galois.GF2
GF = galois.GF(2**5)
g = galois.Poly.Degrees([10, 9, 8, 6, 5, 3, 0])

@numba.jit("int64[:,:](int64[:,:], int64[:])", nopython=True)
def convolve_2d_jit(M, g):
    """Python implementation of function to be JIT-compiled"""
    C = np.zeros((N, n), dtype=M.dtype)
    for i in range(N):
        C[i,:] = np.convolve(M[i,:], g) % 2
    return C

def slow_encode(M, g):
    C = GF2.Zeros((N, n))
    for i in range(N):
        C[i,:] = np.convolve(M[i,:], g)
    return C

def fast_encode(M, g):
    orig_dtype = M.dtype
    M = M.astype(np.int64)
    g = g.astype(np.int64)

    C =  convolve_2d_jit(M, g)

    C = C.astype(orig_dtype)  # Optional (the dtype of M and C was changed to prevent overflow with normal np.convolve)
    C = C.view(GF2)

    return C

In [1]: %run bch_fast.py                                                                        

In [2]: M = GF2.Random((N, k)); M                                                               
Out[2]: 
GF([[1, 1, 0, ..., 0, 0, 1],
    [1, 0, 0, ..., 0, 1, 0],
    [0, 0, 1, ..., 0, 1, 1],
    ...,
    [1, 0, 0, ..., 0, 1, 0],
    [1, 1, 0, ..., 1, 0, 0],
    [1, 1, 0, ..., 1, 1, 0]], order=2)

In [3]: C1 = slow_encode(M, g.coeffs); C1                                                       
Out[3]: 
GF([[1, 0, 0, ..., 0, 0, 1],
    [1, 1, 1, ..., 0, 1, 0],
    [0, 0, 1, ..., 0, 1, 1],
    ...,
    [1, 1, 1, ..., 0, 1, 0],
    [1, 0, 0, ..., 1, 0, 0],
    [1, 0, 0, ..., 1, 1, 0]], order=2)

In [4]: C2 = fast_encode(M, g.coeffs); C2                                                       
Out[4]: 
GF([[1, 0, 0, ..., 0, 0, 1],
    [1, 1, 1, ..., 0, 1, 0],
    [0, 0, 1, ..., 0, 1, 1],
    ...,
    [1, 1, 1, ..., 0, 1, 0],
    [1, 0, 0, ..., 1, 0, 0],
    [1, 0, 0, ..., 1, 1, 0]], order=2)

In [5]: np.array_equal(C1, C2)                                                                  
Out[5]: True

In [6]: %timeit slow_encode(M, g.coeffs)                                                        
9.09 s ± 464 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit fast_encode(M, g.coeffs)                                                        
55.4 ms ± 467 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

About a ~200x speed-up. :+1:

varun19299 commented 3 years ago

Just adding to what you've written before: I think adding a custom polymul and polydiv would be the best idea.

https://numpy.org/doc/stable/reference/generated/numpy.polydiv.html

galois.poly_mul()

galois.poly_div()

galois.poly_divmod()

galois.poly_mod()

Would you keep the underscore? Numpy doesn't seem too (ie, polymul vs poly_mul).

The first step of decoding would be to compute the syndrome, which is polynomial division.

We then need to evaluate it at the primitive powers: can this be done without looping? With looping, it would look like: Field array of polynomial coefficients -> create polynomial objects by looping -> evaluate them at alpha, alpha^2,....

Numpy has a polyval function (https://numpy.org/doc/stable/reference/generated/numpy.polyval.html). Can we have an overridden version of polyval in galois?

mhostetter commented 3 years ago

I agree, generally, that if I can override a numpy function that is better than adding a new one in galois. There are a few issues with numpy's polynomial functions.

There are old and new versions (https://numpy.org/doc/stable/reference/generated/numpy.polydiv.html and https://numpy.org/doc/stable/reference/generated/numpy.polynomial.polynomial.polydiv.html#numpy.polynomial.polynomial.polydiv). The old API is np.polydiv and the new is np.polynomial.polynomial.polydiv (a terrible call signature). Do I support one or both?
The old and new API's take coefficient arrays in different degree order.
Both version take 1-D polynomial coefficient arrays only and divide them, which actually doesn't help you. You'd like to divide a 2-D array of coefficients by a 1-D array of coefficients, for example. I really want to maintain the numpy functionality (not even add to it) when overriding a numpy function. So to support 2-D and 1-D polynomial division, I'd be more inclined to add a new function galois.poly_div.

Other than BCH and Reed Solomon codes, can you think of a use case for 2-D arrays of polynomial coefficients? I'm working on implementing those codes now. I'm leaning toward implementing the numba routines to efficiently encode/decode but not adding general functions to the galois API for polynomial array arithmetic. I'm open to adding them if compelling use cases can be provided. I should have BCH codes added by next week. Maybe you can use/review them?

Regarding polyval, galois does have the ability to evaluate a single polynomial at a scalar or array of values. I might need to improve the documentation to make that more clear.

>>> p = galois.Poly.Degrees([4, 3, 0], [172, 22, 225], field=GF256); p
Poly(172x^4 + 22x^3 + 225, GF(2^8))

# Evaluate the polynomial at a single value
>>> p(1)
GF(91, order=2^8)

>>> x = GF256.Random((2,5)); x
GF([[212, 211, 244, 125,  75],
    [113, 139, 247, 223, 106]], order=2^8)

# Evaluate the polynomial at an array of values
>>> p(x)
GF([[158, 129,  28, 122, 186],
    [184, 132, 179,  49, 223]], order=2^8)

varun19299 commented 3 years ago

I agree, generally, that if I can override a numpy function that is better than adding a new one in galois. There are a few issues with numpy's polynomial functions.

Sorry, I should have clarified: I meant adding the functionality either by overriding or adding equivalent function in galois. The API issues make it all the more compelling to not override!

Other than BCH and Reed Solomon codes, can you think of a use case for 2-D arrays of polynomial coefficients? I'm working on implementing those codes now. I'm leaning toward implementing the numba routines to efficiently encode/decode but not adding general functions to the galois API for polynomial array arithmetic. I'm open to adding them if compelling use cases can be provided.

Perhaps other polynomial codes? But this would involve batched convolution mainly (e.g. cyclic codes).

I should have BCH codes added by next week. Maybe you can use/review them?

Thanks a lot! Sure, I would be happy to test them / review.

varun19299 commented 3 years ago

Other than BCH and Reed Solomon codes, can you think of a use case for 2-D arrays of polynomial coefficients? I'm working on implementing those codes now. I'm leaning toward implementing the numba routines to efficiently encode/decode but not adding general functions to the galois API for polynomial array arithmetic.

Golay Codes [24,12,8] or [23,12,7] are also associated with a generator polynomial.

varun19299 commented 3 years ago

I'm planning to add a list decoding version for BCH too, based on the algorithm presented here: https://arxiv.org/pdf/cs/0703105.pdf. (section 4, page 38).

Can we have berlekamp_massey output bb, L, m as well?

mhostetter commented 3 years ago

@varun19299 Are you referring to the public function galois.berlekamp_massey (currently on add-bch-codes but not released)?

I'm hesitant to return internal variables (that aren't commonly used) from a public function. Is your decoder a pure-python decoder? If so, I would copy my Berlekamp-Massey algorithm locally and return the extra variables.

On the BCH codes, I got binary BCH codes working mid last week, but have spent the last week nearly (probably 60 hours) trying to optimize numba, specifically the compile and load times. Fortunately, after tons of numba discussion board reading and trial and error, I think I have a performant solution. I need to polish some and add documentation. I plan to merge the first cut of binary BCH codes to master in a day or two. I can comment back here if you'd like to try / review / beta test them.

varun19299 commented 3 years ago

@varun19299 Are you referring to the public function galois.berlekamp_massey (currently on add-bch-codes but not released)?

Yes, I was exploring the commits on the add-bch-code PR.

I'm hesitant to return internal variables (that aren't commonly used) from a public function. Is your decoder a pure-python decoder? If so, I would copy my Berlekamp-Massey algorithm locally and return the extra variables.

Sure, that's not an issue. I'll modify the python & numba functions (_berkelamp_massey_{python, numba}) on my local fork accordingly.

On the BCH codes, I got binary BCH codes working mid last week, but have spent the last week nearly (probably 60 hours) trying to optimize numba, specifically the compile and load times. Fortunately, after tons of numba discussion board reading and trial and error, I think I have a performant solution. I need to polish some and add documentation. I plan to merge the first cut of binary BCH codes to master in a day or two. I can comment back here if you'd like to try / review / beta test them.

Thanks again for adding this feature! I'll see if I can extend it to list decoding.

varun19299 commented 3 years ago

What would be a good way to understand the structure of this project? (Eg: the np folder contains empty functions, having only a return call) (ufunc overriding, where does JIT compilation occur, etc).

mhostetter commented 3 years ago

@varun19299 I'm still completely overhauling the ufunc overriding / JIT function process, which should hopefully be done today. I'll answer your question by adding a section to the development guide explaining the project structure. Hopefully that can be live today. I'll report back here when done.

Short answer: the np/ folder is just a dummy folder for me to create null functions that have the same call signature as numpy so I can use Sphinx to auto-generate these docs, albeit not complete. It is not used by galois. The code for galois is in the galois/ folder. And the code pertaining to finite fields is in galois/field/.

varun19299 commented 3 years ago

Thank you for the reply. No problem, please take your time :).

Get Outlook for iOShttps://aka.ms/o0ukef

From: Matt Hostetter @.> Sent: Wednesday, June 9, 2021 5:40:21 PM To: mhostetter/galois @.> Cc: Varun Sundar @.>; Mention @.> Subject: Re: [mhostetter/galois] Array of polynomials (#112)

@varun19299https://github.com/varun19299 I'm still completely overhauling the ufunc overriding / JIT function process, which should hopefully be done today. I'll answer your question by adding a section to the development guide explaining the project structure. Hopefully that can be live today. I'll report back here when done.

Short answer: the np/ folder is just a dummy folder for me to create null functions that have the same call signature as numpy so I can use Sphinx to auto-generate these docshttps://galois.readthedocs.io/en/latest/pages/numpy.html, albeit not complete. It is not used by galois. The code for galois is in the galois/ folder. And the code pertaining to finite fields is in galois/field/.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mhostetter/galois/issues/112#issuecomment-857640316, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFBQJZ5VGA27THPRHQFQLIDTR5K23ANCNFSM45V23EHA.

mhostetter commented 3 years ago

@varun19299 I just merged my first implementation of BCH codes. Currently, only binary primitive codes are supported. In the future, non-binary BCH codes and non-primitive codes will be added.

Also, the array polynomial division routine you requested has been added. It is accessible via GF._poly_divmod(a, b). I use it in BCH decoding of non-systematic codes.

https://github.com/mhostetter/galois/blob/1ecbd56b1d8deb3833b1bdbd8f88857e71d26487/galois/code/bch.py#L267-L270

I don't have access to a Matlab license currently, so all my test vectors were generated with Octave. Any testing/feedback is appreciated. Also, if you notice differences with Matlab, please let me know!

To test, you'll need to install from master, python3 -m pip install -U git+https://github.com/mhostetter/galois. And the docs have the "latest" tag, https://galois.readthedocs.io/en/latest/.

varun19299 commented 3 years ago

@mhostetter I've been using MATLAB for BCH, so I'll certainly compare the two. MATLAB doesn't offer systematic encoding, but I can compare against SageMath.

Do you have any particular test cases in mind: I'm planning on comparing n=15,31,63 and 127.

varun19299 commented 3 years ago

For small batch sizes (~2048), the encoding and decoding are $$O(1)$$ (vectorized).

It grows linearly for larger batch sizes (2^12-2^20), but that's understandable. Overall, its ~20x faster than the MATLAB implementation I was using!!

I'll post detailed comparisons + outputs soon.

mhostetter commented 3 years ago

@varun19299 thanks for testing this! If it's ok with you, let's transfer the BCH performance discussion to a new issue #125. That might make the history cleaner.

Also, given GF._poly_divmod() is available (although not exposed in the public API) what are your thoughts on closing this? Do you think there are other polynomial array operations that should be supported?

varun19299 commented 3 years ago

@varun19299 thanks for testing this! If it's ok with you, let's transfer the BCH performance discussion to a new issue #125. That might make the history cleaner.

Sure, I'll post BCH related issues and performance comparisons on #125.

Also, given GF._poly_divmod() is available (although not exposed in the public API) what are your thoughts on closing this? Do you think there are other polynomial array operations that should be supported?

Maybe _poly_val? I think this may not be high priority, but bi-variate polynomials might be a useful extension too: they show up in list decoding algorithms. Again, this might be too niche to demand an API inclusion.

Finally, I was thinking if GPU support for GF arrays is something worth considering. (maybe via CuPy arrays?).

mhostetter commented 3 years ago

Maybe _poly_val?

Under the hood GF._poly_evaluate() is already supported, which allows for polynomial evaluation of any size / dimension x. And when invoking __call__ on a galois.Poly, this function is called. For example:

In [1]: import galois                                                                                                  

In [2]: GF = galois.GF(2**8)                                                                                           

In [3]: p = galois.Poly([117,23,0,89], field=GF); p                                                                    
Out[3]: Poly(117x^3 + 23x^2 + 89, GF(2^8))

In [4]: x = GF.Random((2,5)); x                                                                                        
Out[4]: 
GF([[220,  54, 190, 169, 125],
    [ 63,  71, 243,  70,  32]], order=2^8)

In [5]: p(x)                                                                                                           
Out[5]: 
GF([[ 39, 188, 138,  66, 111],
    [ 48,  76, 214, 236,  62]], order=2^8)

I think this may not be high priority, but bi-variate polynomials might be a useful extension too: they show up in list decoding algorithms. Again, this might be too niche to demand an API inclusion.

Bi-variate polynomials are an interesting addition. They would require a new polynomial class though, or vast extension of the current class. I think it's worth opening a separate issue for that as a feature request. (Albeit it would take some time to add, so it won't happen immediately)

Finally, I was thinking if GPU support for GF arrays is something worth considering. (maybe via CuPy arrays?).

GPU support is a funny thing. Technically numba supports it. And I was originally trying to support it with the target keyword to GF.compile() and galois.GF(). However, I don't have a GPU to test against and there are a lot of subtleties with transferring data between CPU and GPU.

Furthermore, in doing the optimization for BCH codes (which entailed caching a lot of JIT compiled functions), I'm caching their CPU version. So I was actually thinking of removing "claimed" support for GPU until there was more interest / testing / added capability. I don't want to forget about GPU support, but it will take more work than I initially thought.

varun19299 commented 3 years ago

GPU support is a funny thing. Technically numba supports it. And I was originally trying to support it with the target keyword to GF.compile() and galois.GF(). However, I don't have a GPU to test against and there are a lot of subtleties with transferring data between CPU and GPU.

cuda.jit seems to be quite different from @jit: with thread and block info being used in many examples (from Numba's documentation). For scenarios where no JIT functions are used (eg: BCH encoding is a convolution), would CuPy arrays make more sense? Unfortunately Numba doesn't support CuPy functions (it does however support CuPy arrays).

And I was originally trying to support it with the target keyword to GF.compile() and galois.GF().

I'm not sure specifying target alone will suffice, thread and block info might also be need to be used. So, two different functions would be required for CPU and GPU.

mhostetter commented 3 years ago

@varun19299 yes, I think you're right. I do think CuPY could be a valid approach. It could be an optional dependency and the library could check if installed and use when appropriate. Probably the best use case would be for the linear algebra routines in galois/field/linalg.py, and maybe functions like np.convolve like you mention.

Let's open a separate issue to track GPU support. I do think I need to remove alleged support with numba and target from the documentation, as 1) it's untested and 2) I doubt it will work as is. I don't want to advertise support for GPU until I know it's working.

mhostetter commented 3 years ago

Since all the other topics that were mentioned in this thread are now separate issues (#125, #126, #128), I'm going to close this one. Feel free to re-open if other things come up.

mhostetter / galois

Array of polynomials #112