Classes for Reed Muller Codes

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

This ticket proposes a implementation of Reed Muller Codes. It contains: two new code classes, QAryReedMullerCode and BinaryReedMullerCode, which implements the two classes of reed muller codes two encoder classes, ReedMullerVectorEncoder and ReedMullerPolynomialEncoder which are used by both the code classes some additional functions to assist in computations related to the polynomials.

NOTE: Both the classes are implemented separately since they would have different decoders

I used the following code snippets to test them,

#for q>2
code=ReedMullerCode(3, 2, 2)
print code.dimension()
E1=ReedMullerVectorEncoder(code)
E2=ReedMullerPolynomialEncoder(code)
R=PolynomialRing(code.base_field(),code.numberOfVariable,"x")
x0=R.gen(0)
x1=R.gen(1)
c1=E1.encode(vector(GF(3),[1,1,1,1,1,1]))
print c1
c2=E2.encode(1+x0+x1+x1^2+x1*x0)
print c2
D=LinearCodeSyndromeDecoder(code)
c=D.decode_to_code(vector(GF(3),[1, 2, 0, 0, 2, 0, 1, 1, 1]))
print c
print E2.unencode_nocheck(c)
print D.decode_to_message(vector(GF(3),[1,2,1,0,0,2,1,2,2]))

The output of which was,

6
(1, 0, 1, 0, 0, 2, 1, 2, 2)
(1, 2, 0, 0, 2, 1, 1, 1, 1)
(1, 2, 0, 0, 2, 1, 1, 1, 1)
x0*x1 + x1^2 + x0 + x1 + 1
(1, 1, 1, 1, 1, 1)

#for q=2
code=ReedMullerCode(2, 2, 4)
print code.dimension()
E1=ReedMullerVectorEncoder(code)
E2=ReedMullerPolynomialEncoder(code)
R=PolynomialRing(code.base_field(),code.numberOfVariable,"x")
x0=R.gen(0)
x1=R.gen(1)
x2=R.gen(2)
x3=R.gen(3)
c1=E1.encode(vector(GF(2),[1,1,1,1,1,0,0,0,1,0,0]))
print c1
c2=E2.encode(1+x0+x1+x2+x3*x2)
print c2
D=LinearCodeSyndromeDecoder(code)
c=D.decode_to_code(vector(GF(2),[1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0]))
print c
print E2.unencode_nocheck(c)
print D.decode_to_message(vector(GF(2),[0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0]))

This gave the output as:

11
(1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0)
(1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1)
(1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1)
x2*x3 + x0 + x1 + x2 + 1
(1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0)

CC: @sagetrac-dlucas @johanrosenkilde @jlavauzelle

Component: coding theory

Author: Parthasarathi Panda

Branch: 3cce07f

Reviewer: David Lucas, Johan Sebastian Rosenkilde Nielsen

Issue created by migration from https://trac.sagemath.org/ticket/20705

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

Branch: u/panda314/classes_for_reed_muller_codes

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

Replying to @sagetrac-panda314:

This ticket proposes a implementation of Reed Muller Codes. It contains: two new code classes, QAryReedMullerCode and BinaryReedMullerCode, which implements the two classes of reed muller codes two encoder classes, ReedMullerVectorEncoder and ReedMullerPolynomialEncoder which are used by both the code classes some additional functions to assist in computations related to the polynomials.

NOTE: Both the classes are implemented separately since they would have different decoders

New commits:

`0ffd780`	`adding ReedMullerCode.py containing support for encoding of Reed Muller Codes`
`9fb9b69`	`Merge branch 'RMCode' into t/20705/classes_for_reed_muller_codes`

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

Commit: 9fb9b69

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

Description changed:

--- 
+++ 
@@ -4,3 +4,68 @@
     some additional functions to assist in computations related to the polynomials.

 NOTE: Both the classes are implemented separately since they would have different decoders
+
+I used the following code snippets to test them,
+
+```
+#for q>2
+code=ReedMullerCode(3, 2, 2)
+print code.dimension()
+E1=ReedMullerVectorEncoder(code)
+E2=ReedMullerPolynomialEncoder(code)
+R=PolynomialRing(code.base_field(),code.numberOfVariable,"x")
+x0=R.gen(0)
+x1=R.gen(1)
+c1=E1.encode(vector(GF(3),[1,1,1,1,1,1]))
+print c1
+c2=E2.encode(1+x0+x1+x1^2+x1*x0)
+print c2
+D=LinearCodeSyndromeDecoder(code)
+c=D.decode_to_code(vector(GF(3),[1, 2, 0, 0, 2, 0, 1, 1, 1]))
+print c
+print E2.unencode_nocheck(c)
+print D.decode_to_message(vector(GF(3),[1,2,1,0,0,2,1,2,2]))
+```
+The output of which was,
+
+```
+6
+(1, 0, 1, 0, 0, 2, 1, 2, 2)
+(1, 2, 0, 0, 2, 1, 1, 1, 1)
+(1, 2, 0, 0, 2, 1, 1, 1, 1)
+x0*x1 + x1^2 + x0 + x1 + 1
+(1, 1, 1, 1, 1, 1)
+```
+
+```
+#for q=2
+code=ReedMullerCode(2, 2, 4)
+print code.dimension()
+E1=ReedMullerVectorEncoder(code)
+E2=ReedMullerPolynomialEncoder(code)
+R=PolynomialRing(code.base_field(),code.numberOfVariable,"x")
+x0=R.gen(0)
+x1=R.gen(1)
+x2=R.gen(2)
+x3=R.gen(3)
+c1=E1.encode(vector(GF(2),[1,1,1,1,1,0,0,0,1,0,0]))
+print c1
+c2=E2.encode(1+x0+x1+x2+x3*x2)
+print c2
+D=LinearCodeSyndromeDecoder(code)
+c=D.decode_to_code(vector(GF(2),[1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0]))
+print c
+print E2.unencode_nocheck(c)
+print D.decode_to_message(vector(GF(2),[0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0]))
+```
+This gave the output as:
+
+```
+11
+(1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0)
+(1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1)
+(1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1)
+x2*x3 + x0 + x1 + x2 + 1
+(1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0)
+```
+

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

ba4de76 adding documentation

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 9fb9b69 to ba4de76

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

262e320 adding documentation for additional functions

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from ba4de76 to 262e320

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 262e320 to c5af6d2

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`3fc4924`	`minor edits`
`c5af6d2`	`Some adjustments to way functions take their inputs and adding the option of giving your own polynomial ring to ReedMullerPolynomialEncoder`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

ca93a4b removing ReedMullerCode() of guava.py from the implementation

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from c5af6d2 to ca93a4b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

23ff289 debugged and ran the doctests

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from ca93a4b to 23ff289

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:9

Hello,

I started reading your code. Before giving actual comments on the code itself, here are some general comments:

there is certain conventions to respect while writing Python code. You can find a summary of these conventions here.
lines are supposed to be less than 80 characters long. As I think it's an old rule, I'm completely ok if some lines are around 100 characters, especially in the doctests block which might be hard to break. However, some lines in your code are just too long: e.g. l. 58 is 161 characters long, l.92 is 144 characters long...). A general tip: in Python, as long as you are in the same parenthesis block, you can jump to the next line without any special character. For instance, this:

poly=poly+z*multivariatePolynomialInterpolation([evaluation2[i][k] for i in range(nq)], numberOfVariable-1, order-k, q, finiteField, _R)

can be written like this:

poly=poly+z*multivariatePolynomialInterpolation([evaluation2[i][k] for i in range(nq)],
        numberOfVariable-1, order-k, q, finiteField, _R)

on names: in python, methods and variables names are written like this function_name and variable_name (and not functionName and variableName). Module names should follow the same convention (reed_muller_code.py instead of ReedMullerCode.py).
You wrote the documentation for a few methods like this:

r"""
DOC
"""
def my_func:
    code

but it should always be like this:

def my_func:
    r"""
    DOC
    """
    code

More of the same, this DOC block should always be formatted like this:

def my_func(i1, i2):
    r"""
    Short description of my_func return value 
    (Should be something like "Returns ....")

    [optional] more details

    INPUT:

    - ``i1`` -- description of i1

    - ``i2`` -- description of i2

    [optional] OUTPUT

    EXAMPLES::

        sage: ...

    [optional] TESTS::

        sage: ...

you should always put one whitespace before and after operators (exception: precedence), e.g:
- a = b + c
- a = (b+c) * (b-c)
you used full names for variables everywhere but for _R. I think it would be better to rename it polynomial_ring or something like that.

I'll run tests this afternoon and probably come back with more comments on the code itself. Don't worry about all my comments in formatting, it's your first big ticket for Sage, so it's perfectly normal to use wrong coding conventions. It takes some time to get used to it ;)

Best,

David

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:10

Disclaimer: the following remarks are only related to the code in itsef. I did not run (yet) extensive tests on border cases and larger cases than the ones covered by doctests.

General remark on exceptions: don't need to write "Incorrect data-type ..." every time, as ValueError will be printed on the screen and it means there is something wrong with the input.

Another remark on documentation: you have to add this module to src/doc/en/reference/coding/index.rst to make it appear in the online documentation. For now, this module is never considered when compiling documentation - which means, even if you run sage --docbuild reference/coding, you won't be able to know if there are some bugs in your doc.

binomial_sum

I think it might be better to make it a private method (which can be done by calling it _binomial_sum).

You could have used already implemented methods, ending up with something like:

return sum(binomial(n, i) for i in range(k+1)) or for i in range(k+1): s += binomial(n, i)

BUT it's much slower than you implementation (up to a factor of 10 according to timeit). So, we should just keep what you did :)

multivariate_polynomial_interpolation

I'd make it a private method as it's only used internally as a helper. I would also change the name _R to polynomial_ring. It's probably something you copied from my poor naming choice in grs.py... I changed it in #20744 in the meanwhile ;)

I'd also rename some variables: evaluation2 might be multipoint_evaluation_list or something like that. I would also change the name nq, but I don't have any idea here. I would also rename finiteField as base_field for consistency. You should be careful with 0s and 1s: the 0 l. 87 is not the same as the one on line 89: the former is the zero of the base field while the latter is the zero of the multivariate polynomial ring where poly lives. I recommend using specific variables for these zeroes/ones, which you can define as follows: base_field_zero = base_field.zero().

ReedMullerCode

you need to add :: in the end of your second example's descriptive sentence.

General remark on classes

Most of your class parameters are public variables that one has to call directly (e.g. C.numberOfVariable) to access. I would change this: make all these variables private (prefix them by an underscore) and write getter methods. One should call C.number_of_variables() to the number of variables used.

QAry/BinaryReedMullerCode

You could add a method to compute the minimum distance, as we know a closed formula to compute it, it's quite a shame to call the exhaustive search algortihm if one types My_rm_code.minimum_distance(). If q is 2, D = 2^(m-d), otherwise it's q^m - d*q^(m-1), with m the number of variables and d the order. '' QAryReedMullerCode''
I would add a WARNING:: block which states that the order has to be less than q to be sure users know about it. Of course, it will be removed later on, but the more informative the better.

Vectorial encoder

I noticed you compute the generator matrix at construction time. I strongly disagree with this behaviour: as the generator matrix computation can be heavy, I advise you to only compute it when the needed (i.e. in the method generator_matrix). The method generator_matrix should also be cached (use the decorator @cached_method).
On the generator matrix computation. I think you should use iterators when possible. Especially, the lines

baseFieldTuple=Tuples(baseField.list(),numberOfVariable)
[...] for x in baseFieldTuple.list()

will kill you. For example, over GF(31), with only 4 variables, it takes more than 3 minutes to compute the list of all tuples on my laptop (16 GB RAM). Using an iterator will not compute the full list of tuples, which saves time and memory for larger codes.

Same for exponents: over GF(23), 4 variables and dimension 6, it takes circa 30 seconds to execute this on my laptop:

exponents=Subsets(range(numberOfVariable)*(q-1), submultiset=True)[0:code.dimension()]

Best,

David

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

comment:11

Thanks. I have started implementing some of the changes you have suggested. Regarding formatting, is there any format code or something that i can plug in a ide interface and automatically implement them? It will be tedious to go through the code and manually do them.

Regarding private functions. I had not added examples for them because doctest would not be able to execute them since i had not listed them anywhere. Will writing them as _ solve that?

Replying to @sagetrac-dlucas:

Disclaimer: the following remarks are only related to the code in itsef. I did not run (yet) extensive tests on border cases and larger cases than the ones covered by doctests.

General remark on exceptions: don't need to write "Incorrect data-type ..." every time, as ValueError will be printed on the screen and it means there is something wrong with the input.

Another remark on documentation: you have to add this module to src/doc/en/reference/coding/index.rst to make it appear in the online documentation. For now, this module is never considered when compiling documentation - which means, even if you run sage --docbuild reference/coding, you won't be able to know if there are some bugs in your doc.

binomial_sum

I think it might be better to make it a private method (which can be done by calling it _binomial_sum).

You could have used already implemented methods, ending up with something like:

return sum(binomial(n, i) for i in range(k+1)) or for i in range(k+1): s += binomial(n, i)

BUT it's much slower than you implementation (up to a factor of 10 according to timeit). So, we should just keep what you did :)

multivariate_polynomial_interpolation

I'd make it a private method as it's only used internally as a helper. I would also change the name _R to polynomial_ring. It's probably something you copied from my poor naming choice in grs.py... I changed it in #20744 in the meanwhile ;)

I'd also rename some variables: evaluation2 might be multipoint_evaluation_list or something like that. I would also change the name nq, but I don't have any idea here. I would also rename finiteField as base_field for consistency. You should be careful with 0s and 1s: the 0 l. 87 is not the same as the one on line 89: the former is the zero of the base field while the latter is the zero of the multivariate polynomial ring where poly lives. I recommend using specific variables for these zeroes/ones, which you can define as follows: base_field_zero = base_field.zero().

ReedMullerCode

you need to add :: in the end of your second example's descriptive sentence.

General remark on classes

Most of your class parameters are public variables that one has to call directly (e.g. C.numberOfVariable) to access. I would change this: make all these variables private (prefix them by an underscore) and write getter methods. One should call C.number_of_variables() to the number of variables used.

QAry/BinaryReedMullerCode

You could add a method to compute the minimum distance, as we know a closed formula to compute it, it's quite a shame to call the exhaustive search algortihm if one types My_rm_code.minimum_distance(). If q is 2, D = 2^(m-d), otherwise it's q^m - d*q^(m-1), with m the number of variables and d the order. '' QAryReedMullerCode''

I would add a WARNING:: block which states that the order has to be less than q to be sure users know about it. Of course, it will be removed later on, but the more informative the better.

Vectorial encoder

I noticed you compute the generator matrix at construction time. I strongly disagree with this behaviour: as the generator matrix computation can be heavy, I advise you to only compute it when the needed (i.e. in the method generator_matrix). The method generator_matrix should also be cached (use the decorator @cached_method).

On the generator matrix computation. I think you should use iterators when possible. Especially, the lines
baseFieldTuple=Tuples(baseField.list(),numberOfVariable)
[...] for x in baseFieldTuple.list()
will kill you. For example, over GF(31), with only 4 variables, it takes more than 3 minutes to compute the list of all tuples on my laptop (16 GB RAM). Using an iterator will not compute the full list of tuples, which saves time and memory for larger codes.

Same for exponents: over GF(23), 4 variables and dimension 6, it takes circa 30 seconds to execute this on my laptop:
exponents=Subsets(range(numberOfVariable)*(q-1), submultiset=True)[0:code.dimension()]
Best,

David

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:12

Thanks. I have started implementing some of the changes you have suggested.

Cool!

Regarding formatting, is there any format code or something that i can plug in a ide interface and automatically implement them? It will be tedious to go through the code and manually do them.

Not that I know, I'm afraid. I'm not using any IDE but vim, so I can't be sure though. But if it's just to change MyVariable to my_variable, any text editor has a search and replace feature which will do the job. Of course, you will have to feed it with the target name everytime...

Regarding private functions. I had not added examples for them because doctest would not be able to execute them since i had not listed them anywhere. Will writing them as _ solve that?

Actually, doctests will be executed. Making these methods private with _name will make them disappear from the online doc, but the doctesting framework will run them as well. See _punctured_form in grs.py for an example.

Best,

David

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

comment:13

Hey, so correct me if i am wrong, [x for x in base_field_tuple] is implicitly implemented using iterator. If that's so i think the following implementation should get rid all the redundant iterations through the sets.

exponent=exponents.first() for i in range(dimension):

matrix_list.append([reduce(mul, [x[i] for i in exponent],1) for x in base_field_tuple]) exponent=exponents.next(exponent)

Replying to @sagetrac-dlucas:

Disclaimer: the following remarks are only related to the code in itsef. I did not run (yet) extensive tests on border cases and larger cases than the ones covered by doctests.

General remark on exceptions: don't need to write "Incorrect data-type ..." every time, as ValueError will be printed on the screen and it means there is something wrong with the input.

Another remark on documentation: you have to add this module to src/doc/en/reference/coding/index.rst to make it appear in the online documentation. For now, this module is never considered when compiling documentation - which means, even if you run sage --docbuild reference/coding, you won't be able to know if there are some bugs in your doc.

binomial_sum

I think it might be better to make it a private method (which can be done by calling it _binomial_sum).

You could have used already implemented methods, ending up with something like:

return sum(binomial(n, i) for i in range(k+1)) or for i in range(k+1): s += binomial(n, i)

BUT it's much slower than you implementation (up to a factor of 10 according to timeit). So, we should just keep what you did :)

multivariate_polynomial_interpolation

I'd make it a private method as it's only used internally as a helper. I would also change the name _R to polynomial_ring. It's probably something you copied from my poor naming choice in grs.py... I changed it in #20744 in the meanwhile ;)

I'd also rename some variables: evaluation2 might be multipoint_evaluation_list or something like that. I would also change the name nq, but I don't have any idea here. I would also rename finiteField as base_field for consistency. You should be careful with 0s and 1s: the 0 l. 87 is not the same as the one on line 89: the former is the zero of the base field while the latter is the zero of the multivariate polynomial ring where poly lives. I recommend using specific variables for these zeroes/ones, which you can define as follows: base_field_zero = base_field.zero().

ReedMullerCode

you need to add :: in the end of your second example's descriptive sentence.

General remark on classes

Most of your class parameters are public variables that one has to call directly (e.g. C.numberOfVariable) to access. I would change this: make all these variables private (prefix them by an underscore) and write getter methods. One should call C.number_of_variables() to the number of variables used.

QAry/BinaryReedMullerCode

You could add a method to compute the minimum distance, as we know a closed formula to compute it, it's quite a shame to call the exhaustive search algortihm if one types My_rm_code.minimum_distance(). If q is 2, D = 2^(m-d), otherwise it's q^m - d*q^(m-1), with m the number of variables and d the order. '' QAryReedMullerCode''

I would add a WARNING:: block which states that the order has to be less than q to be sure users know about it. Of course, it will be removed later on, but the more informative the better.

Vectorial encoder

I noticed you compute the generator matrix at construction time. I strongly disagree with this behaviour: as the generator matrix computation can be heavy, I advise you to only compute it when the needed (i.e. in the method generator_matrix). The method generator_matrix should also be cached (use the decorator @cached_method).

On the generator matrix computation. I think you should use iterators when possible. Especially, the lines
baseFieldTuple=Tuples(baseField.list(),numberOfVariable)
[... ...] for x in baseFieldTuple.list()
will kill you. For example, over GF(31), with only 4 variables, it takes more than 3 minutes to compute the list of all tuples on my laptop (16 GB RAM). Using an iterator will not compute the full list of tuples, which saves time and memory for larger codes.

Same for exponents: over GF(23), 4 variables and dimension 6, it takes circa 30 seconds to execute this on my laptop:
exponents=Subsets(range(numberOfVariable)*(q-1), submultiset=True)[0:code.dimension()]
Best,

David

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 23ff289 to 5129c91

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5129c91 reformatting the file according to standards and rewriting the code to optimise computation time

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:15

Hey, so correct me if i am wrong, [x for x in base_field_tuple] is implicitly implemented using iterator. If that's so i think the following implementation should get rid all the redundant iterations through the sets.

Yes, absolutely.

I'm reading your modified code, I'll come back with more comments later on.

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:16

Hello,

Some more comments:

I noticed you add both ReedMullerCode, BinaryReedMullerCode and QAryReedMullerCode to codes_catalog.py. But with our design, only ReedMullerCode should be in this file, as the user should never access the classes themselves.

`_multivariate_polynomial_interpolation`

You left a reference to _R in the docstring while it no longer exists
Is the base field (and the cardinality) really needed? I would rather remove them from the list of inputs and recover them from evaluation. This way, one has to provide evaluation as a vector over a finite field, you get this finite field by doing F = evaluation.base_ring(), you check that F is the base ring of polynomial_ring and that's it. You can get q with the appropriate call on F.
As my previous comment implies, I would only allow evaluation to be a vector (and not a list). After all, it's a private method you only use to recover the original message as a polynomial from a codeword, and the latter is necessarily a vector, isn't it?
l. 105: is there a way to avoid a call to base_field.list()? I know that in most cases, this list will be rather small, but it's not really efficient to build a full list (especially in a recursive pattern, because you will build it a lot of times...). This time again, I'd suggest an iterator over the finite field.

`ReedMullerCode`

the WARNING block should be added in the module documentation as well. The syntax on this block is the following
```
.. WARNING:: 
 body of the block
```
See linear_code.py, line 682 for an example. I would also change the message. I propose For now, this implementation only supports Reed-Muller codes whose order is less than q. Such a message implies it will be extended later, I think it's more positive this way ;)

Q-ary and Binary classes

I think it's not necessary to save q as a class parameter. The user can access it by typing C.base_field().cardinality(), it's not something specific to the class (or something hard to "compute"), so it should not be saved.
I prefer full names for user-accessed methods. I'm okay with num_var as an internal variable, but its getter should be called number_of_variables instead.

Vectorial encoder

In generator_matrix, you're calling self.code() a lot of times. I think you should store it in a local variable which will reduce the amount of calls to this method.

Best,

David

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

comment:17

Replying to @sagetrac-dlucas: Hi,

So should we keep BinaryReedMullerCode in codes_catalog.py in keeping with the BinaryReedMullerCode() function from guava.py? Or should i just keep the guava.py implementation with a warning regarding it's depreciation?

Hello,

Some more comments:

I noticed you add both ReedMullerCode, BinaryReedMullerCode and QAryReedMullerCode to codes_catalog.py. But with our design, only ReedMullerCode should be in this file, as the user should never access the classes themselves.

_multivariate_polynomial_interpolation

You left a reference to _R in the docstring while it no longer exists

Is the base field (and the cardinality) really needed? I would rather remove them from the list of inputs and recover them from evaluation. This way, one has to provide evaluation as a vector over a finite field, you get this finite field by doing F = evaluation.base_ring(), you check that F is the base ring of polynomial_ring and that's it. You can get q with the appropriate call on F.

As my previous comment implies, I would only allow evaluation to be a vector (and not a list). After all, it's a private method you only use to recover the original message as a polynomial from a codeword, and the latter is necessarily a vector, isn't it?

l. 105: is there a way to avoid a call to base_field.list()? I know that in most cases, this list will be rather small, but it's not really efficient to build a full list (especially in a recursive pattern, because you will build it a lot of times...). This time again, I'd suggest an iterator over the finite field.

ReedMullerCode

the WARNING block should be added in the module documentation as well. The syntax on this block is the following
.. WARNING:: 
body of the block
See linear_code.py, line 682 for an example. I would also change the message. I propose For now, this implementation only supports Reed-Muller codes whose order is less than q. Such a message implies it will be extended later, I think it's more positive this way ;)

Q-ary and Binary classes

I think it's not necessary to save q as a class parameter. The user can access it by typing C.base_field().cardinality(), it's not something specific to the class (or something hard to "compute"), so it should not be saved.

I prefer full names for user-accessed methods. I'm okay with num_var as an internal variable, but its getter should be called number_of_variables instead.

Vectorial encoder

In generator_matrix, you're calling self.code() a lot of times. I think you should store it in a local variable which will reduce the amount of calls to this method.

Best,

David

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:18

So should we keep BinaryReedMullerCode? in codes_catalog.py in keeping with the BinaryReedMullerCode?() function from guava.py? Or should i just keep the guava.py implementation with a warning regarding it's depreciation?

Oh, good point! Let's just keep BinaryReedMullerCode in codes_catalog for now, and we'll deal with it afterwards.

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

comment:19

Replying to @sagetrac-panda314: Hey so regarding _multivariate_polynomial_interpolation so when i recursively use the function inside itself i actually pass a list and not a vector. Anycase since it's a private function does it matter?

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:20

Well, I'm just thinking about the future: maybe we can export that later on in another part of Sage. But I guess the vector/list behaviour is ok for now, we can change it in the future if needed.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 5129c91 to cd09b8a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`649cead`	`removinng unecessary class parameters and variables and removing QAryReedMullerCode from codes_catalog.py and other tweaks`
`cd09b8a`	`changing the warning messages`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from cd09b8a to ad16c8f

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

ad16c8f corecting a misplaced warning message

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:23

Hello,

Some other remarks (the last ones from me):

in the input sanitization, you test integer parameters using this statement: if not isinstance(param, Integer). Actually, as in Sage you can have both Sage Integers and Python ints, you need to write if not isinstance(param, (int, Integer)) otherwise if the user passes a Python int for param it will be rejected by the check.
in your minimum_distance method for QAryReedMullerCode, you should use integer division // instead of /. I agree the result will always be an integer because of the parameters, but using / returns a Rational while // returns an Integer, and I think it's better to return an Integer.
method generator_matrix does not work if order is 0:

sage: sage: F = GF(3)
sage: C = codes.ReedMullerCode(F, 0, 2)
sage: sage: E = codes.encoders.ReedMullerVectorEncoder(C)
sage: sage: E.generator_matrix()

BOOM (StopIteration)

In case of order 0, just return a matrix which has only one line of 1 (it is a repetition code in that case).

Still on generator matrix, you're using exponents.first() and exponents.next(). Why not using a "true" iterator instead? As you take the elements in Subsets one by one following the order they've been sorted, an Iterator has a better complexity in that case, because next(obj)'s complexity is O(r) where r is the rank of obj.
Also, generator_matrix method is quite slow: with a RM code over GF(16), order 14, 3 variables, it takes around 30s to compute its generator matrix (while with a GRS code over GF(4096), dimension 680, it takes around 3 seconds). It's just a remark so we keep it in mind, and I do not request any changes for this ticket. I'm completely fine with your implementation as is for a first ticket, and we can work on efficiency later on if we want to!
Same with encode in PolynomialEncoder: some algorithms for fast multipoint evaluation exist, and we can use them to speed up this encoding. But once again, your implementation is definitely good enough for a first ticket!

Otherwise, it sounds good. It's quite an impressive work for your first ticket, well done! You will be the first one to have his implementation of an "advanced" linear code family in Sage. I'm jealous ;)

David

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from ad16c8f to 260d8da

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

260d8da using iterators to loop through

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

comment:25

Replying to @sagetrac-dlucas: the error with order 0 was because '.next()' function was not computing in the last element of the subsets (a problem if the dimension becomes equal to the total number of possible exponents) anyways replaced with iterator so that problem is gone. I was under the impression that .next() works like iterator. Anyways i used the python's iterator hopefully it will speed it up. Speed does seem to be a big issue with this implementation. I am unable to lay a finger on the exact problem causing that.

Hello,

Some other remarks (the last ones from me):

in the input sanitization, you test integer parameters using this statement: if not isinstance(param, Integer). Actually, as in Sage you can have both Sage Integers and Python ints, you need to write if not isinstance(param, (int, Integer)) otherwise if the user passes a Python int for param it will be rejected by the check.

in your minimum_distance method for QAryReedMullerCode, you should use integer division // instead of /. I agree the result will always be an integer because of the parameters, but using / returns a Rational while // returns an Integer, and I think it's better to return an Integer.

method generator_matrix does not work if order is 0:
sage: sage: F = GF(3)
sage: C = codes.[wiki:ReedMullerCode](F, 0, 2)
sage: sage: E = codes.encoders.[wiki:ReedMullerVectorEncoder](C)
sage: sage: E.generator_matrix()

BOOM (StopIteration)
In case of order 0, just return a matrix which has only one line of 1 (it is a repetition code in that case).

Still on generator matrix, you're using exponents.first() and exponents.next(). Why not using a "true" iterator instead? As you take the elements in Subsets one by one following the order they've been sorted, an Iterator has a better complexity in that case, because next(obj)'s complexity is O(r) where r is the rank of obj.

Also, generator_matrix method is quite slow: with a RM code over GF(16), order 14, 3 variables, it takes around 30s to compute its generator matrix (while with a GRS code over GF(4096), dimension 680, it takes around 3 seconds). It's just a remark so we keep it in mind, and I do not request any changes for this ticket. I'm completely fine with your implementation as is for a first ticket, and we can work on efficiency later on if we want to!

Same with encode in PolynomialEncoder: some algorithms for fast multipoint evaluation exist, and we can use them to speed up this encoding. But once again, your implementation is definitely good enough for a first ticket!

Otherwise, it sounds good. It's quite an impressive work for your first ticket, well done! You will be the first one to have his implementation of an "advanced" linear code family in Sage. I'm jealous ;)

David

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

Changed branch from u/panda314/classes_for_reed_muller_codes to u/dlucas/classes_for_reed_muller_codes

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

Changed commit from 260d8da to 13cda90

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:27

Hello,

I made some minor changes to the documentation:

There was some syntax error which prevented it to compile properly, which I fixed
I removed duplicated warning blocks
I changed some line breaks
I removed trailing whitespaces in the process
I simplified a few methods' docstring.

I did not change the code at all, just made some changes to the doc because it was faster to do it myself :).

If you agree with my changes, we can give this ticket a positive review! Don't forget to write your full name in the Authors field in this ticket.

Best,

David

New commits:

`77b533d`	`Updated to 7.3beta3`
`13cda90`	`Some tweaks (mostly documentation)`

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

Reviewer: David Lucas

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

comment:28

Replying to @sagetrac-dlucas: Seems fine :)

Hello,

I made some minor changes to the documentation:

There was some syntax error which prevented it to compile properly, which I fixed

I removed duplicated warning blocks

I changed some line breaks

I removed trailing whitespaces in the process

I simplified a few methods' docstring.

I did not change the code at all, just made some changes to the doc because it was faster to do it myself :).

If you agree with my changes, we can give this ticket a positive review! Don't forget to write your full name in the Authors field in this ticket.

Best,

David

New commits:

77b533d Updated to 7.3beta3

13cda90 Some tweaks (mostly documentation)


77b533d	`Updated to 7.3beta3`


13cda90	`Some tweaks (mostly documentation)`

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

Author: Parthasarathi Panda

johanrosenkilde commented 8 years ago

comment:30

Test to see if I can comment.

johanrosenkilde commented 8 years ago

comment:31

I know I'm late to the scene, but I have a few questions/comments:

Why are lines 249--255 in reed_muller_code.py formatted so badly? Similarly for line 70--75. This type of formatting is not PEP8, I believe.
The patch in linear_code.py adding Integer around n-k shouldn't be necessary: n is C.length() and k is C.dimension(). Both should always return Integer. You should find the example that led you to make this patch and instead fix the root cause that made either n or k rational. Incidentally, I think I know where the bug is: your _binomial_sum returns a rational since you have a division that you don't cast to Integer. Amusingly, that cast seems to speed up _binomial_sum by a factor 3 or so.
Could you please improve the docstring of order() with a second sentence explaining what the order is? The current docstring is the type of infuriatingly unhelpful docstrings that are way too common ;-)
Same for number_of_variables and minimum_distance. Yes, I am aware that there are code classes with similarly unhelpful docstrings, but let's try to improve Sage :-)
For __repr__ etc., perhaps it be more consistent with Reed--Solomon codes to write something like "Reed-Muller of order X in Y variables over self.base_field()".

johanrosenkilde commented 8 years ago

comment:32

In __eq__ you're checking equality of field cardinalities -- just check equality of the fields.
In docstrings, use "Reed-Muller" consistently (and not "reed muller" or other things).
In docstring warning of ReedMullerCode, the first sentence seems to be redundant (and not strictly true, since Reed-Muller codes make sense with order up to (d-1)*q). The same goes for the warning in QaryReedMullerCode. Also, all caps is ALWAYS ugly and impolite ;-)
Can you please improve the documentation in ReedMullerCode to explain what a ReedMullerCode is? Also, the first paragraph of a docstring may only contain one sentence. Incidentally, in ReedMullerCode, I suggest removing the two other sentences in the first paragraph since that is an implementation detail and not relevant to the user.
The docstring of ReedMullerCode, num_of_var: the (i.e. m) doesn't make sense to the user. Perhaps it will when you add the description of what a ReedMullerCode is.
Perhaps add in the docstring of QaryReedMullerCode and BinaryReedMullerCode a reference to ReedMullerCode saying that this should usually be used, and also to see that function for description of the codes.
There is some documentation missing about the order of the points used in ReedMullerCode, _multivariate_polynomial_interpolation, etc.

johanrosenkilde commented 8 years ago

comment:33

Could you please add to the doctest of _multivariate_polynomial_interpolation that the returned polynomial interpolates the input values?
Could you please add to the example blocks of ReedMullerCode something that prints the length and dimension and min-dist of the code? That is good for paedagogy.
What happens in _multivariate_polynomial_interpolation if one inputs evaluations which cannot be interpolated by a polynomial of total degree order? Document.
Consider using more mathematical docstring description rather than programming-like. See for instance _multivariate_polynomial_interpolation, where the first sentence could instead read something like `Return $f \in \GF(q)[X_1,...,X_m]$ such that $f(\mathbf a) = v[i(\mathbf a)]$ for all $\mathbf a \in \GF(q)^m$, where $v \in GF(q)^{qm}$ is a given vector of evaluations, and $i(a)$ is a specific ordering of $GF(q)^m$ (see below for details)."
The input num_of_var to _multivariate_polynomial_interpolation is redundant since you get the polynomial ring. Use len(polynomial_ring.gens()) or something.
In _multivariate_polynomial_interpolation you misspelled "coordinate".
Instead of Tuples(F.list(), m) you should probably use (F^m).list().
Be careful with micro-optimisations that end up not having a real impact but makes the code denser to read. An example is the use of z and x in _multivariate_polynomial_interpolation. In line 117 you could also just write polyVector += [base_field.zero()]*(d-len(polyVector)).
The doc of _binomial_sum sounds like it sums up to and including k. But it is only up to and including k-1.

johanrosenkilde commented 8 years ago

comment:34

In ReedMullerVectorEncoder.generator_matrix, you should iterate directly over exponents: for exponent in exponents: ....
In ReedMullerVectorEncoder.generator_matrix and other places, perhaps base_field_tuple should simply be called points.
There should be a method to get the list of points used for evaluation.
I don't think ReedMullerVectorEncoder.generator_matrix is correct: you are indexing in the point x using exponent. But you should be computing prod_{i=1}^m x[i]^exponent[i]. Using reduce(mul,...) in this instance is less clear than using prod(...).
In doc for ReedMullerPolynomialEncoder it says "polynomial field", but it should be "polynomial ring". It should say that polynomial_ring is optional.
Copy-paste error: it says something about If code is not GRS code ....
The polynomial in the example of ReedMullerPolynomialEncoder.encode is badly formatted.
The failing example of ReedMullerPolynomialEncoder.encode with poly of too high degree could be with a poly where each variable has OK degree, but the total degree is bad, e.g. x0^2*x1.

Otherwise, good job! Don't worry about the volume of my comments. David has fond memories of tickets where I similarly put him through the wringer ;-)

Best, Johan

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

comment:35

Replying to @johanrosenkilde: No problem with comments :) . Will work on them. Some doubts/explanation.

Well i guess they are so because of the length of line restrictions and the automated formatter did everything 'strictly'. 249--255 is indeed ugly will change that. 70--75 seems fine though.
Oh. I guess i will remove the linear_code.py path and use '//' instead of '/' in binomial sum. Might be faster too because of integer divisions instead of rational numbers.
Isn't it sum upto 'k' that we need?
Directly iterating involves enumeration over the subset redundantly if i am not mistaken. Used the iterator to do the generation in one scan.
So given the term prod!^m_{i=1} x!^(d_i)_i, exponent is the set {!{1}d_1 {2}d_2 ...{m}*d_m}. In which case i believe that the generator matrix is correct (it matches with polynomial evaluation). Will use the prod function instead.

Well everything else makes sense. Will implement them :)

In ReedMullerVectorEncoder.generator_matrix, you should iterate directly over exponents: for exponent in exponents: ....

In ReedMullerVectorEncoder.generator_matrix and other places, perhaps base_field_tuple should simply be called points.

There should be a method to get the list of points used for evaluation.

I don't think ReedMullerVectorEncoder.generator_matrix is correct: you are indexing in the point x using exponent. But you should be computing prod_{i=1}^m x[i]^exponent[i]. Using reduce(mul,...) in this instance is less clear than using prod(...).

In doc for ReedMullerPolynomialEncoder it says "polynomial field", but it should be "polynomial ring". It should say that polynomial_ring is optional.

Copy-paste error: it says something about If code is not GRS code ....

The polynomial in the example of ReedMullerPolynomialEncoder.encode is badly formatted.

The failing example of ReedMullerPolynomialEncoder.encode with poly of too high degree could be with a poly where each variable has OK degree, but the total degree is bad, e.g. x0^2*x1.

Otherwise, good job! Don't worry about the volume of my comments. David has fond memories of tickets where I similarly put him through the wringer ;-)

Best, Johan

johanrosenkilde commented 8 years ago

comment:36

Well i guess they are so because of the length of line restrictions and the automated formatter did everything 'strictly'. 249--255 is indeed ugly will change that. 70--75 seems fine though.

I won't be a stickler, so up to you. I looked up PEP8 and it doesn't seem to advocate putting parameters on individual lines.

Oh. I guess i will remove the linear_code.py path and use '//' instead of '/' in binomial sum. Might be faster too because of integer divisions instead of rational numbers.

Yes.

Isn't it sum upto 'k' that we need?

Yes it is. Just clarify in the docstring ("up to k" is ambiguous in English).

Directly iterating involves enumeration over the subset redundantly if i am not mistaken. Used the iterator to do the generation in one scan.

Doing for e in some_iterable does exactly the same as looping with an iterator, memory-wise. What you shouldn't do is for e in some_iterable.list().

So given the term prod!^m_{i=1} x!^(d_i)_i, exponent is the set {!{1}d_1 {2}d_2 ...{m}*d_m}. In which case i believe that the generator matrix is correct (it matches with polynomial evaluation). Will use the prod function instead.

Ah, yes now I see!

You should probably use exponents = Subsets(..., k=order). Your current code depends in a fragile, undocumented manner on the order that Subsets iterates over its elements. When using k=order you get only subsets of size exactly k, but you could get around this by adding a dummy element order times to the list.

Well everything else makes sense. Will implement them :)

Cool!

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

comment:37

Replying to @johanrosenkilde: Regarding '17.' Is there any way one can pass a sub ring of a multivariate polynomial ring consisting of polynomials over a subset of the variables? Like F[x_1,x_2,x_3] is to F[x_1,x_2,x_3,x_4]? The num_of_var parameter used in the function sort of does that.

Could you please add to the doctest of _multivariate_polynomial_interpolation that the returned polynomial interpolates the input values?

Could you please add to the example blocks of ReedMullerCode something that prints the length and dimension and min-dist of the code? That is good for paedagogy.

What happens in _multivariate_polynomial_interpolation if one inputs evaluations which cannot be interpolated by a polynomial of total degree order? Document.

Consider using more mathematical docstring description rather than programming-like. See for instance _multivariate_polynomial_interpolation, where the first sentence could instead read something like `Return $f \in \GF(q)[X_1,...,X_m]$ such that $f(\mathbf a) = v[i(\mathbf a)]$ for all $\mathbf a \in \GF(q)^m$, where $v \in GF(q)^{qm}$ is a given vector of evaluations, and $i(a)$ is a specific ordering of $GF(q)^m$ (see below for details)."

The input num_of_var to _multivariate_polynomial_interpolation is redundant since you get the polynomial ring. Use len(polynomial_ring.gens()) or something.

In _multivariate_polynomial_interpolation you misspelled "coordinate".

Instead of Tuples(F.list(), m) you should probably use (F^m).list().

Be careful with micro-optimisations that end up not having a real impact but makes the code denser to read. An example is the use of z and x in _multivariate_polynomial_interpolation. In line 117 you could also just write polyVector += [base_field.zero()]*(d-len(polyVector)).

The doc of _binomial_sum sounds like it sums up to and including k. But it is only up to and including k-1.

9d23ac82-dd36-4d71-98d4-450a1836d8f4 commented 8 years ago

comment:38

Replying to @johanrosenkilde: In '11.' is there any way to hyperlink the reference to ReedMullerCode in the reference of QAryReedMullerCode and BinaryReedMullerCode? Or shall i just write somehing like 'refer to documentation of' ?

In __eq__ you're checking equality of field cardinalities -- just check equality of the fields.

In docstrings, use "Reed-Muller" consistently (and not "reed muller" or other things).

In docstring warning of ReedMullerCode, the first sentence seems to be redundant (and not strictly true, since Reed-Muller codes make sense with order up to (d-1)*q). The same goes for the warning in QaryReedMullerCode. Also, all caps is ALWAYS ugly and impolite ;-)

Can you please improve the documentation in ReedMullerCode to explain what a ReedMullerCode is? Also, the first paragraph of a docstring may only contain one sentence. Incidentally, in ReedMullerCode, I suggest removing the two other sentences in the first paragraph since that is an implementation detail and not relevant to the user.

The docstring of ReedMullerCode, num_of_var: the (i.e. m) doesn't make sense to the user. Perhaps it will when you add the description of what a ReedMullerCode is.

Perhaps add in the docstring of QaryReedMullerCode and BinaryReedMullerCode a reference to ReedMullerCode saying that this should usually be used, and also to see that function for description of the codes.

There is some documentation missing about the order of the points used in ReedMullerCode, _multivariate_polynomial_interpolation, etc.

sagemath / sage