Closed 9d23ac82-dd36-4d71-98d4-450a1836d8f4 closed 8 years ago
Replying to @sagetrac-panda314:
This ticket proposes a implementation of Reed Muller Codes. It contains: two new code classes, QAryReedMullerCode and BinaryReedMullerCode, which implements the two classes of reed muller codes two encoder classes, ReedMullerVectorEncoder and ReedMullerPolynomialEncoder which are used by both the code classes some additional functions to assist in computations related to the polynomials.
NOTE: Both the classes are implemented separately since they would have different decoders
New commits:
0ffd780 | adding ReedMullerCode.py containing support for encoding of Reed Muller Codes |
9fb9b69 | Merge branch 'RMCode' into t/20705/classes_for_reed_muller_codes |
Commit: 9fb9b69
Description changed:
---
+++
@@ -4,3 +4,68 @@
some additional functions to assist in computations related to the polynomials.
NOTE: Both the classes are implemented separately since they would have different decoders
+
+I used the following code snippets to test them,
+
+```
+#for q>2
+code=ReedMullerCode(3, 2, 2)
+print code.dimension()
+E1=ReedMullerVectorEncoder(code)
+E2=ReedMullerPolynomialEncoder(code)
+R=PolynomialRing(code.base_field(),code.numberOfVariable,"x")
+x0=R.gen(0)
+x1=R.gen(1)
+c1=E1.encode(vector(GF(3),[1,1,1,1,1,1]))
+print c1
+c2=E2.encode(1+x0+x1+x1^2+x1*x0)
+print c2
+D=LinearCodeSyndromeDecoder(code)
+c=D.decode_to_code(vector(GF(3),[1, 2, 0, 0, 2, 0, 1, 1, 1]))
+print c
+print E2.unencode_nocheck(c)
+print D.decode_to_message(vector(GF(3),[1,2,1,0,0,2,1,2,2]))
+```
+The output of which was,
+
+```
+6
+(1, 0, 1, 0, 0, 2, 1, 2, 2)
+(1, 2, 0, 0, 2, 1, 1, 1, 1)
+(1, 2, 0, 0, 2, 1, 1, 1, 1)
+x0*x1 + x1^2 + x0 + x1 + 1
+(1, 1, 1, 1, 1, 1)
+```
+
+```
+#for q=2
+code=ReedMullerCode(2, 2, 4)
+print code.dimension()
+E1=ReedMullerVectorEncoder(code)
+E2=ReedMullerPolynomialEncoder(code)
+R=PolynomialRing(code.base_field(),code.numberOfVariable,"x")
+x0=R.gen(0)
+x1=R.gen(1)
+x2=R.gen(2)
+x3=R.gen(3)
+c1=E1.encode(vector(GF(2),[1,1,1,1,1,0,0,0,1,0,0]))
+print c1
+c2=E2.encode(1+x0+x1+x2+x3*x2)
+print c2
+D=LinearCodeSyndromeDecoder(code)
+c=D.decode_to_code(vector(GF(2),[1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0]))
+print c
+print E2.unencode_nocheck(c)
+print D.decode_to_message(vector(GF(2),[0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0]))
+```
+This gave the output as:
+
+```
+11
+(1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0)
+(1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1)
+(1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1)
+x2*x3 + x0 + x1 + x2 + 1
+(1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0)
+```
+
Branch pushed to git repo; I updated commit sha1. New commits:
ba4de76 | adding documentation |
Branch pushed to git repo; I updated commit sha1. New commits:
262e320 | adding documentation for additional functions |
Branch pushed to git repo; I updated commit sha1. New commits:
ca93a4b | removing ReedMullerCode() of guava.py from the implementation |
Branch pushed to git repo; I updated commit sha1. New commits:
23ff289 | debugged and ran the doctests |
Hello,
I started reading your code. Before giving actual comments on the code itself, here are some general comments:
there is certain conventions to respect while writing Python code. You can find a summary of these conventions here.
lines are supposed to be less than 80 characters long. As I think it's an old rule, I'm completely ok if some lines are around 100 characters, especially in the doctests block which might be hard to break. However, some lines in your code are just too long: e.g. l. 58 is 161 characters long, l.92 is 144 characters long...). A general tip: in Python, as long as you are in the same parenthesis block, you can jump to the next line without any special character. For instance, this:
poly=poly+z*multivariatePolynomialInterpolation([evaluation2[i][k] for i in range(nq)], numberOfVariable-1, order-k, q, finiteField, _R)
can be written like this:
poly=poly+z*multivariatePolynomialInterpolation([evaluation2[i][k] for i in range(nq)],
numberOfVariable-1, order-k, q, finiteField, _R)
on names: in python, methods and variables names are written like this function_name
and variable_name
(and not functionName
and variableName
). Module names should follow the same convention (reed_muller_code.py
instead of ReedMullerCode.py
).
You wrote the documentation for a few methods like this:
r"""
DOC
"""
def my_func:
code
but it should always be like this:
def my_func:
r"""
DOC
"""
code
def my_func(i1, i2):
r"""
Short description of my_func return value
(Should be something like "Returns ....")
[optional] more details
INPUT:
- ``i1`` -- description of i1
- ``i2`` -- description of i2
[optional] OUTPUT
EXAMPLES::
sage: ...
[optional] TESTS::
sage: ...
you should always put one whitespace before and after operators (exception: precedence), e.g:
a = b + c
a = (b+c) * (b-c)
you used full names for variables everywhere but for _R
. I think it would be better to rename it polynomial_ring
or something like that.
I'll run tests this afternoon and probably come back with more comments on the code itself.
Don't worry about all my comments in formatting, it's your first big ticket for Sage, so it's perfectly normal to use wrong coding conventions. It takes some time to get used to it ;)
Best,
David
Disclaimer: the following remarks are only related to the code in itsef. I did not run (yet) extensive tests on border cases and larger cases than the ones covered by doctests.
General remark on exceptions: don't need to write "Incorrect data-type ..." every time, as ValueError
will be printed on the screen and it means there is something wrong with the input.
Another remark on documentation: you have to add this module to src/doc/en/reference/coding/index.rst
to make it appear in the online documentation. For now, this module is never considered when compiling documentation - which means, even if you run sage --docbuild reference/coding
, you won't be able to know if there are some bugs in your doc.
binomial_sum
I think it might be better to make it a private method (which can be done by calling it _binomial_sum).
You could have used already implemented methods, ending up with something like:
return sum(binomial(n, i) for i in range(k+1))
or for i in range(k+1): s += binomial(n, i)
BUT it's much slower than you implementation (up to a factor of 10 according to timeit
). So, we should just keep what you did :)
multivariate_polynomial_interpolation
I'd make it a private method as it's only used internally as a helper.
I would also change the name _R
to polynomial_ring
. It's probably something
you copied from my poor naming choice in grs.py... I changed it in #20744 in the meanwhile ;)
I'd also rename some variables: evaluation2
might be multipoint_evaluation_list
or something like that. I would also change the name nq
, but I don't have any idea here. I would also rename finiteField
as base_field
for consistency.
You should be careful with 0
s and 1
s: the 0
l. 87 is not the same as the one on line 89: the former is the zero of the base field while the latter is the zero of the multivariate polynomial ring where poly
lives.
I recommend using specific variables for these zeroes/ones, which you can define as follows: base_field_zero = base_field.zero()
.
ReedMullerCode
::
in the end of your second example's descriptive sentence.General remark on classes
C.numberOfVariable
) to access. I would change this: make all these variables private (prefix them by an underscore) and write getter methods.
One should call C.number_of_variables()
to the number of variables used.QAry/BinaryReedMullerCode
You could add a method to compute the minimum distance, as we know a closed formula to compute it, it's quite a shame to call the exhaustive search algortihm if one types My_rm_code.minimum_distance()
.
If q
is 2, D = 2^(m-d)
, otherwise it's q^m - d*q^(m-1)
, with m
the number of variables and d
the order.
''
QAryReedMullerCode''
I would add a WARNING::
block which states that the order has to be less than q
to be sure users know about it. Of course, it will be removed later on, but the more informative the better.
Vectorial encoder
I noticed you compute the generator matrix at construction time. I strongly disagree with this behaviour: as the generator matrix computation can be heavy, I advise you to only compute it when the needed (i.e. in the method generator_matrix
). The method generator_matrix
should also be cached (use the decorator @cached_method
).
On the generator matrix computation. I think you should use iterators when possible. Especially, the lines
baseFieldTuple=Tuples(baseField.list(),numberOfVariable)
[...] for x in baseFieldTuple.list()
will kill you. For example, over GF(31)
, with only 4 variables, it takes more than 3 minutes to compute the list of all tuples on my laptop (16 GB RAM).
Using an iterator will not compute the full list of tuples, which saves time and memory for larger codes.
Same for exponents
: over GF(23), 4 variables and dimension 6, it takes circa 30 seconds to execute this on my laptop:
exponents=Subsets(range(numberOfVariable)*(q-1), submultiset=True)[0:code.dimension()]
Best,
David
Thanks. I have started implementing some of the changes you have suggested. Regarding formatting, is there any format code or something that i can plug in a ide interface and automatically implement them? It will be tedious to go through the code and manually do them.
Regarding private functions. I had not added examples for them because doctest would not be able to execute them since i had not listed them anywhere. Will writing them as _
Replying to @sagetrac-dlucas:
Disclaimer: the following remarks are only related to the code in itsef. I did not run (yet) extensive tests on border cases and larger cases than the ones covered by doctests.
General remark on exceptions: don't need to write "Incorrect data-type ..." every time, as
ValueError
will be printed on the screen and it means there is something wrong with the input.Another remark on documentation: you have to add this module to
src/doc/en/reference/coding/index.rst
to make it appear in the online documentation. For now, this module is never considered when compiling documentation - which means, even if you runsage --docbuild reference/coding
, you won't be able to know if there are some bugs in your doc.binomial_sum
I think it might be better to make it a private method (which can be done by calling it _binomial_sum).
You could have used already implemented methods, ending up with something like:
return sum(binomial(n, i) for i in range(k+1))
orfor i in range(k+1): s += binomial(n, i)
BUT it's much slower than you implementation (up to a factor of 10 according to
timeit
). So, we should just keep what you did :)multivariate_polynomial_interpolation
I'd make it a private method as it's only used internally as a helper. I would also change the name
_R
topolynomial_ring
. It's probably something you copied from my poor naming choice in grs.py... I changed it in #20744 in the meanwhile ;)I'd also rename some variables:
evaluation2
might bemultipoint_evaluation_list
or something like that. I would also change the namenq
, but I don't have any idea here. I would also renamefiniteField
asbase_field
for consistency. You should be careful with0
s and1
s: the0
l. 87 is not the same as the one on line 89: the former is the zero of the base field while the latter is the zero of the multivariate polynomial ring wherepoly
lives. I recommend using specific variables for these zeroes/ones, which you can define as follows:base_field_zero = base_field.zero()
.ReedMullerCode
- you need to add
::
in the end of your second example's descriptive sentence.General remark on classes
- Most of your class parameters are public variables that one has to call directly (e.g.
C.numberOfVariable
) to access. I would change this: make all these variables private (prefix them by an underscore) and write getter methods. One should callC.number_of_variables()
to the number of variables used.QAry/BinaryReedMullerCode
You could add a method to compute the minimum distance, as we know a closed formula to compute it, it's quite a shame to call the exhaustive search algortihm if one types
My_rm_code.minimum_distance()
. Ifq
is 2,D = 2^(m-d)
, otherwise it'sq^m - d*q^(m-1)
, withm
the number of variables andd
the order. '' QAryReedMullerCode''I would add a
WARNING::
block which states that the order has to be less thanq
to be sure users know about it. Of course, it will be removed later on, but the more informative the better.Vectorial encoder
I noticed you compute the generator matrix at construction time. I strongly disagree with this behaviour: as the generator matrix computation can be heavy, I advise you to only compute it when the needed (i.e. in the method
generator_matrix
). The methodgenerator_matrix
should also be cached (use the decorator@cached_method
).On the generator matrix computation. I think you should use iterators when possible. Especially, the lines
baseFieldTuple=Tuples(baseField.list(),numberOfVariable) [...] for x in baseFieldTuple.list()
will kill you. For example, over
GF(31)
, with only 4 variables, it takes more than 3 minutes to compute the list of all tuples on my laptop (16 GB RAM). Using an iterator will not compute the full list of tuples, which saves time and memory for larger codes.Same for
exponents
: over GF(23), 4 variables and dimension 6, it takes circa 30 seconds to execute this on my laptop:exponents=Subsets(range(numberOfVariable)*(q-1), submultiset=True)[0:code.dimension()]
Best,
David
Thanks. I have started implementing some of the changes you have suggested.
Cool!
Regarding formatting, is there any format code or something that i can plug in a ide interface and automatically implement them? It will be tedious to go through the code and manually do them.
Not that I know, I'm afraid. I'm not using any IDE but vim, so I can't be sure though.
But if it's just to change MyVariable
to my_variable
, any text editor has a search and replace feature which will do the job. Of course, you will have to feed it with the target name everytime...
Regarding private functions. I had not added examples for them because doctest would not be able to execute them since i had not listed them anywhere. Will writing them as _
solve that?
Actually, doctests will be executed. Making these methods private with _name
will make them disappear from the online doc, but the doctesting framework will run them as well. See _punctured_form
in grs.py
for an example.
Best,
David
Hey, so correct me if i am wrong, [x for x in base_field_tuple] is implicitly implemented using iterator. If that's so i think the following implementation should get rid all the redundant iterations through the sets.
exponent=exponents.first() for i in range(dimension):
matrix_list.append([reduce(mul, [x[i] for i in exponent],1) for x in base_field_tuple]) exponent=exponents.next(exponent)
Replying to @sagetrac-dlucas:
Disclaimer: the following remarks are only related to the code in itsef. I did not run (yet) extensive tests on border cases and larger cases than the ones covered by doctests.
General remark on exceptions: don't need to write "Incorrect data-type ..." every time, as
ValueError
will be printed on the screen and it means there is something wrong with the input.Another remark on documentation: you have to add this module to
src/doc/en/reference/coding/index.rst
to make it appear in the online documentation. For now, this module is never considered when compiling documentation - which means, even if you runsage --docbuild reference/coding
, you won't be able to know if there are some bugs in your doc.binomial_sum
I think it might be better to make it a private method (which can be done by calling it _binomial_sum).
You could have used already implemented methods, ending up with something like:
return sum(binomial(n, i) for i in range(k+1))
orfor i in range(k+1): s += binomial(n, i)
BUT it's much slower than you implementation (up to a factor of 10 according to
timeit
). So, we should just keep what you did :)multivariate_polynomial_interpolation
I'd make it a private method as it's only used internally as a helper. I would also change the name
_R
topolynomial_ring
. It's probably something you copied from my poor naming choice in grs.py... I changed it in #20744 in the meanwhile ;)I'd also rename some variables:
evaluation2
might bemultipoint_evaluation_list
or something like that. I would also change the namenq
, but I don't have any idea here. I would also renamefiniteField
asbase_field
for consistency. You should be careful with0
s and1
s: the0
l. 87 is not the same as the one on line 89: the former is the zero of the base field while the latter is the zero of the multivariate polynomial ring wherepoly
lives. I recommend using specific variables for these zeroes/ones, which you can define as follows:base_field_zero = base_field.zero()
.ReedMullerCode
- you need to add
::
in the end of your second example's descriptive sentence.General remark on classes
- Most of your class parameters are public variables that one has to call directly (e.g.
C.numberOfVariable
) to access. I would change this: make all these variables private (prefix them by an underscore) and write getter methods. One should callC.number_of_variables()
to the number of variables used.QAry/BinaryReedMullerCode
You could add a method to compute the minimum distance, as we know a closed formula to compute it, it's quite a shame to call the exhaustive search algortihm if one types
My_rm_code.minimum_distance()
. Ifq
is 2,D = 2^(m-d)
, otherwise it'sq^m - d*q^(m-1)
, withm
the number of variables andd
the order. '' QAryReedMullerCode''I would add a
WARNING::
block which states that the order has to be less thanq
to be sure users know about it. Of course, it will be removed later on, but the more informative the better.Vectorial encoder
I noticed you compute the generator matrix at construction time. I strongly disagree with this behaviour: as the generator matrix computation can be heavy, I advise you to only compute it when the needed (i.e. in the method
generator_matrix
). The methodgenerator_matrix
should also be cached (use the decorator@cached_method
).On the generator matrix computation. I think you should use iterators when possible. Especially, the lines
baseFieldTuple=Tuples(baseField.list(),numberOfVariable) [... ...] for x in baseFieldTuple.list()
will kill you. For example, over
GF(31)
, with only 4 variables, it takes more than 3 minutes to compute the list of all tuples on my laptop (16 GB RAM). Using an iterator will not compute the full list of tuples, which saves time and memory for larger codes.Same for
exponents
: over GF(23), 4 variables and dimension 6, it takes circa 30 seconds to execute this on my laptop:exponents=Subsets(range(numberOfVariable)*(q-1), submultiset=True)[0:code.dimension()]
Best,
David
Branch pushed to git repo; I updated commit sha1. New commits:
5129c91 | reformatting the file according to standards and rewriting the code to optimise computation time |
Hey, so correct me if i am wrong,
[x for x in base_field_tuple]
is implicitly implemented using iterator. If that's so i think the following implementation should get rid all the redundant iterations through the sets.
Yes, absolutely.
I'm reading your modified code, I'll come back with more comments later on.
Hello,
Some more comments:
ReedMullerCode
, BinaryReedMullerCode
and QAryReedMullerCode
to codes_catalog.py
. But with our design, only ReedMullerCode
should be in this file, as the user should never access the classes themselves._multivariate_polynomial_interpolation
_R
in the docstring while it no longer existsevaluation
. This way, one has to provide evaluation
as a vector over a finite field, you get this finite field by doing F = evaluation.base_ring()
, you check that F
is the base ring of polynomial_ring
and that's it. You can get q
with the appropriate call on F
.evaluation
to be a vector (and not a list). After all, it's a private method you only use to recover the original message as a polynomial from a codeword, and the latter is necessarily a vector, isn't it? base_field.list()
? I know that in most cases, this list will be rather small, but it's not really efficient to build a full list (especially in a recursive pattern, because you will build it a lot of times...). This time again, I'd suggest an iterator over the finite field.ReedMullerCode
the WARNING
block should be added in the module documentation as well. The syntax on this block is the following
.. WARNING::
body of the block
See linear_code.py
, line 682 for an example.
I would also change the message. I propose For now, this implementation only supports Reed-Muller codes whose order is less than q
. Such a message implies it will be extended later, I think it's more positive this way ;)
I think it's not necessary to save q
as a class parameter. The user can access it by typing C.base_field().cardinality()
, it's not something specific to the class (or something hard to "compute"), so it should not be saved.
I prefer full names for user-accessed methods. I'm okay with num_var
as an internal variable, but its getter should be called number_of_variables
instead.
generator_matrix
, you're calling self.code()
a lot of times. I think you should store it in a local variable which will reduce the amount of calls to this method.Best,
David
Replying to @sagetrac-dlucas: Hi,
So should we keep BinaryReedMullerCode in codes_catalog.py in keeping with the BinaryReedMullerCode() function from guava.py? Or should i just keep the guava.py implementation with a warning regarding it's depreciation?
Hello,
Some more comments:
- I noticed you add both
ReedMullerCode
,BinaryReedMullerCode
andQAryReedMullerCode
tocodes_catalog.py
. But with our design, onlyReedMullerCode
should be in this file, as the user should never access the classes themselves.
_multivariate_polynomial_interpolation
- You left a reference to
_R
in the docstring while it no longer exists- Is the base field (and the cardinality) really needed? I would rather remove them from the list of inputs and recover them from
evaluation
. This way, one has to provideevaluation
as a vector over a finite field, you get this finite field by doingF = evaluation.base_ring()
, you check thatF
is the base ring ofpolynomial_ring
and that's it. You can getq
with the appropriate call onF
.- As my previous comment implies, I would only allow
evaluation
to be a vector (and not a list). After all, it's a private method you only use to recover the original message as a polynomial from a codeword, and the latter is necessarily a vector, isn't it?- l. 105: is there a way to avoid a call to
base_field.list()
? I know that in most cases, this list will be rather small, but it's not really efficient to build a full list (especially in a recursive pattern, because you will build it a lot of times...). This time again, I'd suggest an iterator over the finite field.
ReedMullerCode
- the
WARNING
block should be added in the module documentation as well. The syntax on this block is the following.. WARNING:: body of the block
See
linear_code.py
, line 682 for an example. I would also change the message. I proposeFor now, this implementation only supports Reed-Muller codes whose order is less than q
. Such a message implies it will be extended later, I think it's more positive this way ;)Q-ary and Binary classes
I think it's not necessary to save
q
as a class parameter. The user can access it by typingC.base_field().cardinality()
, it's not something specific to the class (or something hard to "compute"), so it should not be saved.I prefer full names for user-accessed methods. I'm okay with
num_var
as an internal variable, but its getter should be callednumber_of_variables
instead.Vectorial encoder
- In
generator_matrix
, you're callingself.code()
a lot of times. I think you should store it in a local variable which will reduce the amount of calls to this method.Best,
David
So should we keep BinaryReedMullerCode? in codes_catalog.py in keeping with the BinaryReedMullerCode?() function from guava.py? Or should i just keep the guava.py implementation with a warning regarding it's depreciation?
Oh, good point! Let's just keep BinaryReedMullerCode
in codes_catalog
for now, and we'll deal with it afterwards.
Replying to @sagetrac-panda314:
Hey so regarding _multivariate_polynomial_interpolation
so when i recursively use the function inside itself i actually pass a list and not a vector. Anycase since it's a private function does it matter?
Well, I'm just thinking about the future: maybe we can export that later on in another part of Sage. But I guess the vector/list behaviour is ok for now, we can change it in the future if needed.
Branch pushed to git repo; I updated commit sha1. New commits:
ad16c8f | corecting a misplaced warning message |
Hello,
Some other remarks (the last ones from me):
in the input sanitization, you test integer parameters using this statement: if not isinstance(param, Integer)
. Actually, as in Sage you can have both Sage Integers and Python ints, you need to write if not isinstance(param, (int, Integer))
otherwise if the user passes a Python int for param
it will be rejected by the check.
in your minimum_distance
method for QAryReedMullerCode
, you should use integer division //
instead of /
. I agree the result will always be an integer because of the parameters, but using /
returns a Rational while //
returns an Integer, and I think it's better to return an Integer.
method generator_matrix
does not work if order is 0:
sage: sage: F = GF(3)
sage: C = codes.ReedMullerCode(F, 0, 2)
sage: sage: E = codes.encoders.ReedMullerVectorEncoder(C)
sage: sage: E.generator_matrix()
BOOM (StopIteration)
In case of order 0, just return a matrix which has only one line of 1
(it is a repetition code in that case).
Still on generator matrix, you're using exponents.first()
and exponents.next()
. Why not using a "true" iterator instead? As you take the elements in Subsets
one by one following the order they've been sorted, an Iterator has a better complexity in that case, because next(obj)
's complexity is O(r)
where r
is the rank of obj
.
Also, generator_matrix
method is quite slow: with a RM code over GF(16)
, order 14, 3 variables, it takes around 30s to compute its generator matrix (while with a GRS code over GF(4096), dimension 680, it takes around 3 seconds). It's just a remark so we keep it in mind, and I do not request any changes for this ticket. I'm completely fine with your implementation as is for a first ticket, and we can work on efficiency later on if we want to!
Same with encode
in PolynomialEncoder
: some algorithms for fast multipoint evaluation exist, and we can use them to speed up this encoding. But once again, your implementation is definitely good enough for a first ticket!
Otherwise, it sounds good. It's quite an impressive work for your first ticket, well done!
You will be the first one to have his implementation of an "advanced" linear code family in Sage. I'm jealous ;)
David
Branch pushed to git repo; I updated commit sha1. New commits:
260d8da | using iterators to loop through |
Replying to @sagetrac-dlucas: the error with order 0 was because '.next()' function was not computing in the last element of the subsets (a problem if the dimension becomes equal to the total number of possible exponents) anyways replaced with iterator so that problem is gone. I was under the impression that .next() works like iterator. Anyways i used the python's iterator hopefully it will speed it up. Speed does seem to be a big issue with this implementation. I am unable to lay a finger on the exact problem causing that.
Well i found reed solomon pretty advanced :D . Feels good to see 'Copyright (C) 2016 Parthasarathi Panda parthasarathipanda314@gmail.com' :) . Thanks for all the help. Next stop decoding
Hello,
Some other remarks (the last ones from me):
in the input sanitization, you test integer parameters using this statement:
if not isinstance(param, Integer)
. Actually, as in Sage you can have both Sage Integers and Python ints, you need to writeif not isinstance(param, (int, Integer))
otherwise if the user passes a Python int forparam
it will be rejected by the check.in your
minimum_distance
method forQAryReedMullerCode
, you should use integer division//
instead of/
. I agree the result will always be an integer because of the parameters, but using/
returns a Rational while//
returns an Integer, and I think it's better to return an Integer.method
generator_matrix
does not work if order is 0:sage: sage: F = GF(3) sage: C = codes.[wiki:ReedMullerCode](F, 0, 2) sage: sage: E = codes.encoders.[wiki:ReedMullerVectorEncoder](C) sage: sage: E.generator_matrix() BOOM (StopIteration)
In case of order 0, just return a matrix which has only one line of
1
(it is a repetition code in that case).
Still on generator matrix, you're using
exponents.first()
andexponents.next()
. Why not using a "true" iterator instead? As you take the elements inSubsets
one by one following the order they've been sorted, an Iterator has a better complexity in that case, becausenext(obj)
's complexity isO(r)
wherer
is the rank ofobj
.Also,
generator_matrix
method is quite slow: with a RM code overGF(16)
, order 14, 3 variables, it takes around 30s to compute its generator matrix (while with a GRS code over GF(4096), dimension 680, it takes around 3 seconds). It's just a remark so we keep it in mind, and I do not request any changes for this ticket. I'm completely fine with your implementation as is for a first ticket, and we can work on efficiency later on if we want to!Same with
encode
inPolynomialEncoder
: some algorithms for fast multipoint evaluation exist, and we can use them to speed up this encoding. But once again, your implementation is definitely good enough for a first ticket!Otherwise, it sounds good. It's quite an impressive work for your first ticket, well done! You will be the first one to have his implementation of an "advanced" linear code family in Sage. I'm jealous
;)
David
Changed branch from u/panda314/classes_for_reed_muller_codes to u/dlucas/classes_for_reed_muller_codes
Hello,
I made some minor changes to the documentation:
I did not change the code at all, just made some changes to the doc because it was faster to do it myself :)
.
If you agree with my changes, we can give this ticket a positive review!
Don't forget to write your full name in the Authors
field in this ticket.
Best,
David
New commits:
77b533d | Updated to 7.3beta3 |
13cda90 | Some tweaks (mostly documentation) |
Reviewer: David Lucas
Replying to @sagetrac-dlucas: Seems fine :)
Hello,
I made some minor changes to the documentation:
- There was some syntax error which prevented it to compile properly, which I fixed
- I removed duplicated warning blocks
- I changed some line breaks
- I removed trailing whitespaces in the process
- I simplified a few methods' docstring.
I did not change the code at all, just made some changes to the doc because it was faster to do it myself
:)
.If you agree with my changes, we can give this ticket a positive review! Don't forget to write your full name in the
Authors
field in this ticket.Best,
David
New commits:
77b533d Updated to 7.3beta3
13cda90 Some tweaks (mostly documentation)
Author: Parthasarathi Panda
Test to see if I can comment.
I know I'm late to the scene, but I have a few questions/comments:
linear_code.py
adding Integer
around n-k
shouldn't be necessary: n
is C.length()
and k
is C.dimension()
. Both should always return Integer
. You should find the example that led you to make this patch and instead fix the root cause that made either n
or k
rational. Incidentally, I think I know where the bug is: your _binomial_sum
returns a rational since you have a division that you don't cast to Integer
. Amusingly, that cast seems to speed up _binomial_sum
by a factor 3 or so.order()
with a second sentence explaining what the order is? The current docstring is the type of infuriatingly unhelpful docstrings that are way too common ;-)number_of_variables
and minimum_distance
. Yes, I am aware that there are code classes with similarly unhelpful docstrings, but let's try to improve Sage :-)__repr__
etc., perhaps it be more consistent with Reed--Solomon codes to write something like "Reed-Muller of order X in Y variables over self.base_field()
".__eq__
you're checking equality of field cardinalities -- just check equality of the fields.ReedMullerCode
, the first sentence seems to be redundant (and not strictly true, since Reed-Muller codes make sense with order up to (d-1)*q). The same goes for the warning in QaryReedMullerCode
. Also, all caps is ALWAYS ugly and impolite ;-)ReedMullerCode
to explain what a ReedMullerCode
is? Also, the first paragraph of a docstring may only contain one sentence. Incidentally, in ReedMullerCode
, I suggest removing the two other sentences in the first paragraph since that is an implementation detail and not relevant to the user.ReedMullerCode
, num_of_var
: the (i.e. m)
doesn't make sense to the user. Perhaps it will when you add the description of what a ReedMullerCode
is.QaryReedMullerCode
and BinaryReedMullerCode
a reference to ReedMullerCode
saying that this should usually be used, and also to see that function for description of the codes.ReedMullerCode
, _multivariate_polynomial_interpolation
, etc._multivariate_polynomial_interpolation
that the returned polynomial interpolates the input values?ReedMullerCode
something that prints the length and dimension and min-dist of the code? That is good for paedagogy._multivariate_polynomial_interpolation
if one inputs evaluations which cannot be interpolated by a polynomial of total degree order
? Document._multivariate_polynomial_interpolation
, where the first sentence could instead read something like `Return $f \in \GF(q)[X_1,...,X_m]$ such that $f(\mathbf a) = v[i(\mathbf a)]$ for all $\mathbf a \in \GF(q)^m$, where $v \in GF(q){qm}$ is a given vector of evaluations, and $i(a)$ is a specific ordering of $GF(q)^m$ (see below for details)."num_of_var
to _multivariate_polynomial_interpolation
is redundant since you get the polynomial ring. Use len(polynomial_ring.gens())
or something._multivariate_polynomial_interpolation
you misspelled "coordinate".Tuples(F.list(), m)
you should probably use (F^m).list()
.z
and x
in _multivariate_polynomial_interpolation
. In line 117 you could also just write polyVector += [base_field.zero()]*(d-len(polyVector))
._binomial_sum
sounds like it sums up to and including k
. But it is only up to and including k-1
.ReedMullerVectorEncoder.generator_matrix
, you should iterate directly over exponents
: for exponent in exponents: ...
.ReedMullerVectorEncoder.generator_matrix
and other places, perhaps base_field_tuple
should simply be called points
.ReedMullerVectorEncoder.generator_matrix
is correct: you are indexing in the point x
using exponent
. But you should be computing prod_{i=1}^m x[i]^exponent[i]
. Using reduce(mul,...)
in this instance is less clear than using prod(...)
.ReedMullerPolynomialEncoder
it says "polynomial field", but it should be "polynomial ring". It should say that polynomial_ring
is optional.If code is not GRS code ...
.ReedMullerPolynomialEncoder.encode
is badly formatted.ReedMullerPolynomialEncoder.encode
with poly of too high degree could be with a poly where each variable has OK degree, but the total degree is bad, e.g. x0^2*x1
.Otherwise, good job! Don't worry about the volume of my comments. David has fond memories of tickets where I similarly put him through the wringer ;-)
Best, Johan
Replying to @johanrosenkilde: No problem with comments :) . Will work on them. Some doubts/explanation.
Well i guess they are so because of the length of line restrictions and the automated formatter did everything 'strictly'. 249--255 is indeed ugly will change that. 70--75 seems fine though.
Oh. I guess i will remove the linear_code.py path and use '//' instead of '/' in binomial sum. Might be faster too because of integer divisions instead of rational numbers.
Isn't it sum upto 'k' that we need?
Directly iterating involves enumeration over the subset redundantly if i am not mistaken. Used the iterator to do the generation in one scan.
So given the term prod!^m_{i=1} x!^(d_i)_i, exponent is the set {!{1}d_1 {2}d_2 ...{m}*d_m}. In which case i believe that the generator matrix is correct (it matches with polynomial evaluation). Will use the prod function instead.
Well everything else makes sense. Will implement them :)
- In
ReedMullerVectorEncoder.generator_matrix
, you should iterate directly overexponents
:for exponent in exponents: ...
.- In
ReedMullerVectorEncoder.generator_matrix
and other places, perhapsbase_field_tuple
should simply be calledpoints
.- There should be a method to get the list of points used for evaluation.
- I don't think
ReedMullerVectorEncoder.generator_matrix
is correct: you are indexing in the pointx
usingexponent
. But you should be computingprod_{i=1}^m x[i]^exponent[i]
. Usingreduce(mul,...)
in this instance is less clear than usingprod(...)
.- In doc for
ReedMullerPolynomialEncoder
it says "polynomial field", but it should be "polynomial ring". It should say thatpolynomial_ring
is optional.- Copy-paste error: it says something about
If code is not GRS code ...
.- The polynomial in the example of
ReedMullerPolynomialEncoder.encode
is badly formatted.- The failing example of
ReedMullerPolynomialEncoder.encode
with poly of too high degree could be with a poly where each variable has OK degree, but the total degree is bad, e.g.x0^2*x1
.Otherwise, good job! Don't worry about the volume of my comments. David has fond memories of tickets where I similarly put him through the wringer ;-)
Best, Johan
- Well i guess they are so because of the length of line restrictions and the automated formatter did everything 'strictly'. 249--255 is indeed ugly will change that. 70--75 seems fine though.
I won't be a stickler, so up to you. I looked up PEP8 and it doesn't seem to advocate putting parameters on individual lines.
- Oh. I guess i will remove the linear_code.py path and use '//' instead of '/' in binomial sum. Might be faster too because of integer divisions instead of rational numbers.
Yes.
- Isn't it sum upto 'k' that we need?
Yes it is. Just clarify in the docstring ("up to k" is ambiguous in English).
- Directly iterating involves enumeration over the subset redundantly if i am not mistaken. Used the iterator to do the generation in one scan.
Doing for e in some_iterable
does exactly the same as looping with an iterator, memory-wise. What you shouldn't do is for e in some_iterable.list()
.
- So given the term prod!^m_{i=1} x!^(d_i)_i, exponent is the set {!{1}d_1 {2}d_2 ...{m}*d_m}. In which case i believe that the generator matrix is correct (it matches with polynomial evaluation). Will use the prod function instead.
Ah, yes now I see!
You should probably use exponents = Subsets(..., k=order)
. Your current code
depends in a fragile, undocumented manner on the order that Subsets
iterates
over its elements. When using k=order
you get only subsets of size exactly
k
, but you could get around this by adding a dummy element order
times to
the list.
Well everything else makes sense. Will implement them :)
Cool!
Replying to @johanrosenkilde: Regarding '17.' Is there any way one can pass a sub ring of a multivariate polynomial ring consisting of polynomials over a subset of the variables? Like F[x_1,x_2,x_3] is to F[x_1,x_2,x_3,x_4]? The num_of_var parameter used in the function sort of does that.
- Could you please add to the doctest of
_multivariate_polynomial_interpolation
that the returned polynomial interpolates the input values?- Could you please add to the example blocks of
ReedMullerCode
something that prints the length and dimension and min-dist of the code? That is good for paedagogy.- What happens in
_multivariate_polynomial_interpolation
if one inputs evaluations which cannot be interpolated by a polynomial of total degreeorder
? Document.- Consider using more mathematical docstring description rather than programming-like. See for instance
_multivariate_polynomial_interpolation
, where the first sentence could instead read something like `Return $f \in \GF(q)[X_1,...,X_m]$ such that $f(\mathbf a) = v[i(\mathbf a)]$ for all $\mathbf a \in \GF(q)^m$, where $v \in GF(q){qm}$ is a given vector of evaluations, and $i(a)$ is a specific ordering of $GF(q)^m$ (see below for details)."- The input
num_of_var
to_multivariate_polynomial_interpolation
is redundant since you get the polynomial ring. Uselen(polynomial_ring.gens())
or something.- In
_multivariate_polynomial_interpolation
you misspelled "coordinate".- Instead of
Tuples(F.list(), m)
you should probably use(F^m).list()
.- Be careful with micro-optimisations that end up not having a real impact but makes the code denser to read. An example is the use of
z
andx
in_multivariate_polynomial_interpolation
. In line 117 you could also just writepolyVector += [base_field.zero()]*(d-len(polyVector))
.- The doc of
_binomial_sum
sounds like it sums up to and includingk
. But it is only up to and includingk-1
.
Replying to @johanrosenkilde: In '11.' is there any way to hyperlink the reference to ReedMullerCode in the reference of QAryReedMullerCode and BinaryReedMullerCode? Or shall i just write somehing like 'refer to documentation of' ?
- In
__eq__
you're checking equality of field cardinalities -- just check equality of the fields.- In docstrings, use "Reed-Muller" consistently (and not "reed muller" or other things).
- In docstring warning of
ReedMullerCode
, the first sentence seems to be redundant (and not strictly true, since Reed-Muller codes make sense with order up to (d-1)*q). The same goes for the warning inQaryReedMullerCode
. Also, all caps is ALWAYS ugly and impolite ;-)- Can you please improve the documentation in
ReedMullerCode
to explain what aReedMullerCode
is? Also, the first paragraph of a docstring may only contain one sentence. Incidentally, inReedMullerCode
, I suggest removing the two other sentences in the first paragraph since that is an implementation detail and not relevant to the user.- The docstring of
ReedMullerCode
,num_of_var
: the(i.e. m)
doesn't make sense to the user. Perhaps it will when you add the description of what aReedMullerCode
is.- Perhaps add in the docstring of
QaryReedMullerCode
andBinaryReedMullerCode
a reference toReedMullerCode
saying that this should usually be used, and also to see that function for description of the codes.- There is some documentation missing about the order of the points used in
ReedMullerCode
,_multivariate_polynomial_interpolation
, etc.
This ticket proposes a implementation of Reed Muller Codes. It contains: two new code classes, QAryReedMullerCode and BinaryReedMullerCode, which implements the two classes of reed muller codes two encoder classes, ReedMullerVectorEncoder and ReedMullerPolynomialEncoder which are used by both the code classes some additional functions to assist in computations related to the polynomials.
NOTE: Both the classes are implemented separately since they would have different decoders
I used the following code snippets to test them,
The output of which was,
This gave the output as:
CC: @sagetrac-dlucas @johanrosenkilde @jlavauzelle
Component: coding theory
Author: Parthasarathi Panda
Branch:
3cce07f
Reviewer: David Lucas, Johan Sebastian Rosenkilde Nielsen
Issue created by migration from https://trac.sagemath.org/ticket/20705