A new structure for Punctured Codes

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

This ticket proposes a new implementation for punctured codes.

It contains:

a new code class, PuncturedCode
a dedicated encoder to compute the generator matrix of a Punctured Code.
a dedicate decoder which uses the original code to correct errors

This new structure presents the following advantages:

it keeps track of the code it comes from. If one punctures a code A, the punctured code one gets will remember it comes from A.
This allows to perform several operations, like encoding or picking a random element without constructing the generator matrix for the punctured code, thus saving time and memory.
because it keeps track of the original code, it's possible to get a structured representation of the punctured code, e.g. if one punctures a GRS code and asks for the structured representation of the punctured code one got, a GRS code will be returned.

Depends on #19653

CC: @wdjoyner @ppurka

Component: coding theory

Author: David Lucas

Branch/Commit: 45d4ee1

Reviewer: Julien Lavauzelle

Issue created by migration from https://trac.sagemath.org/ticket/19422

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

Branch: u/dlucas/punctured_codes

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:2

Content available, it's now ready for review.

New commits:

37683a1 New Punctured Codes object, coming with its own encoder

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

Commit: 37683a1

johanrosenkilde commented 9 years ago

comment:3

AbstractLinearCode has a method punctured. Shouldn't this return a PuncturedCode object after this patch?

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:4

I chose to keep this method as it directly returns a LinearCode object. So if one wants to puncture and immediately get the LinearCode representation of his punctured code, he can thanks to this method. I plan to change it when the mechanism to get another representation of a punctured code (e.g. get a RS code from a punctured RS code) will be ready.

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

Author: David Lucas

johanrosenkilde commented 9 years ago

comment:6

I think the default should always be to retain as much structure as possible. The C.punctured() method is certainly the "default" a user would call.

Furthermore, having a PuncturedCode should never get in the way for users who are simply interested in the unstructured linear code representation: it advertises exactly the same methods (plus some more). So I don't think there is any deprecation needed.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

7f9d27e AbstractLinearCode's punctured method now returns a PuncturedCode object

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 37683a1 to 7f9d27e

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:8

Ok, I agree with you.

Changes have been made and pushed.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 7f9d27e to ca859cb

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`4473a2c`	`Updated to latest beta and fixed conflicts`
`74fc44e`	`Added decoder for punctured codes`
`ca859cb`	`Added a mechanism to get the structured representation of any punctured code`

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:10

A few updates on this ticket:

it's now under the latest beta
as we now have the structure for decoders in Sage, I added a dedicated decoder for punctured codes
as we now have GRS codes in Sage, which still are GRS codes after puncturing, I introduced a mechanism which allows one to get the most specific representation possible for a punctured code, based on the original code.

This is still open for review.

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

Description changed:

--- 
+++ 
@@ -6,10 +6,13 @@

 - a dedicated encoder to compute the generator matrix of a Punctured Code.

+- a dedicate decoder which uses the original code to correct errors
+
 This new structure presents the following advantages:

-- it keeps track of the code it comes from. If one punctures a code A, the punctured code he gets will remember it comes from A.
+- it keeps track of the code it comes from. If one punctures a code A, the punctured code one gets will remember it comes from A.

 - This allows to perform several operations, like encoding or picking a random element without constructing the generator matrix for the punctured code, thus saving time and memory.

-- Later on, it will also allow to decode words using a decoder over the original code of the punctured code.
+- because it keeps track of the original code, it's possible to get a structured representation of the punctured code, e.g. if one punctures a GRS code and asks for the structured representation of the punctured code one got, a GRS code will be returned.
+

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

26aae17 Fixed conflict after update to 7.1beta0

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from ca859cb to 26aae17

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:12

I updated this ticket to latest beta, and fixed merge conflict. It's still open for review.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

`8b654ac`	`Renamed method precompute and changed its doc`
`b60ca15`	`Rewrote documentation of partial_xgcd and made it a private method`
`5ed4ac3`	`Fixed error in error-erasure's decoding radius`
`03c15f3`	`Removed useless backslashes`
`cc3cb4d`	`Rewrote error message in KeyEquationSyndromeDecoder`
`a7f3ee1`	`Fixed typos`
`c8bc406`	`Rewrote some examples`
`7cbcca2`	`Rewrote some sentences`
`4aebb5a`	`Fixed bug related to full codes, plus some minor tweaks`
`fa355cd`	`Enhanced decoder documentation`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 26aae17 to fa355cd

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:14

I fixed a few typos, updated deprecated imports which were breaking doctests. I also completely rewrote doctests and documentation of the decoder, using GRS decoders introduced in #19653.

This is still open for review.

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

Dependencies: #19653

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

`a074cb8`	`Changed my helper methods into private methods`
`6ad583f`	`Fixed bug in KeyEquation decoder's decode_to_code, added sanity checks, enhanced doctests`
`fca099e`	`Added new sanity check on the output of BW and Gao decoder`
`57dbfbf`	`Rewrote random_element method`
`7953d60`	`Proper sanity checks for output of decode_to_* methods`
`8673ac5`	`Optimized a bit polynomial division in Gao and BW`
`e1b6b09`	`Merged now postive reviewed #19666 and fixed conflicts`
`5093dce`	`Update to 7.1beta5`
`f3b4b11`	`Fixed typo in Guruswami-Sudan doc`
`5c61b11`	`Merged dependecies and fixed conflicts`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from fa355cd to 5c61b11

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:17

I updated this ticket to the latest beta and fixed conflicts. This is still open for review.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 5c61b11 to 69da167

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`f2e8c28`	`Merge branch 'develop' into punctured_code`
`69da167`	`Fixed broken doctest`

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:19

I updated this ticket to 7.2beta1 and fixed a broken doctest. This is still open for review.

jlavauzelle commented 8 years ago

comment:20

I start the review even though the ticket is not updated to the latest beta. Despite my comments, note that I realize that building this framework must have been a huge work, and that I found it really user-friendly =)

General remarks

As I read the (quite long) name of the canonical decoder of your punctured code (PuncturedCodeOriginalCodeDecoder), I asked myself the following question: if I understand well your naming convention, decoders are named Code+DecoderName. Is it necessary to have the Code suffix ? My point is that it could be sufficient to only keep DecoderName, because if a user wants to know what kind of code the decoder is applied to, he just needs to call Dec.code().
You consider puncturing points as a list. Would it be better (and more confortable to compute unions and intersections) to consider them a set ?

index.rst

why do you remove sage/coding/source_coding/huffman ?

src/sage/coding/grs.py

The method _punctured_form(self, points) doesn't work if points is not sorted (e.g. C_grs._punctured_form([4,3])). That's due to your strange way to select the evaluation points and multipliers. I think you should remove the block:

start = 0
for i in points:
    punctured_alphas += alphas[start:i]
    punctured_col_mults += col_mults[start:i]
    start = i + 1
    punctured_alphas += alphas[start:n]
    punctured_col_mults += col_mults[start:n]

and replace it by something like (that's only a suggestion):

punctured_alphas = [ alphas[i] for i in range(n) if i not in points ]
punctured_col_mults = [ col_mults[i] for i in range(n) if i not in points ]

In the same method, if I understand it well, this block:

if G.rank() != dimension:
    G = G.echelon_form()
    for i in range(dimension):
        if G[i] == 0:
        dimension -= 1

aims at computing the code dimension (stored in dimension). In fact, you already computed it in G.rank().

src/sage/coding/linear_code.py

in _punctured_form method, G.echelon_form() puts all the zero lines at the bottom of the new matrix. Thus, if you want to get the non-zero part of the matrix, I think you could simply write

    k = G.rank()
    return LinearCode(G[:k])

jlavauzelle commented 8 years ago

comment:21

src/sage/coding/punctured_code.py --- function puncture ---

same strange way to compute the non-punctured positions than in grs.py. For example, it doesn't work with:

C = codes.RandomLinearCode(11, 5, GF(7))
Cp = codes.PuncturedCode(C, [4,3])
v = vector(GF(7), (2,3,0,2,1,5,1,5,6,5,3))
sage.coding.punctured_code.puncture(v, [4,3], Cp)

--- class PuncturedCode ---

constructor: positions = list(set(positions)) is a short (maybe dirty) way to do:

unique_positions = set()
for i in positions:
    unique_positions.add(i)
positions = []
for i in unique_positions:
   positions.append(i)

and it looks to sort the elements as well =) Also remark that you don't have this "multiplicity" issue when using Python sets (instead of lists) for storing puncturing points.

in encode() method, you missed parentheses after self.punctured_positions in return puncture(c, self.punctured_positions, self)
in structured_representation(self) method: in the doc, you wrote "A punctured GRS code is still a punctured code" but I think you meant "still a GRS code". Moreover, I don't understand what you mean by "Which is not the case for generic linear codes": punctured linear codes are still linear codes.
still in structured_representation(self) method, I tried this:

sage: C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4)
sage: P = codes.PuncturedCode(C, [1,3])
sage: Q = codes.PuncturedCode(P, [1,3])
sage: P.length()
5
sage: Q.length()
3
sage: P.structured_representation()
[5, 4, 2] Generalized Reed-Solomon Code over Finite Field of size 7
sage: Q.structured_representation()
[5, 3, 3] Generalized Reed-Solomon Code over Finite Field of size 7

So, double puncturing works well (Q seems to be the right code), but structured_representation() doesn't build the right Reed-Solomon code.

--- class PuncturedCodePuncturedMatrixEncoder ---

in GeneratorMatrix: same comment as in linear_code.py, I think echelon_form()do the job and you just need to keep the k first rows.

--- class PuncturedCodeOriginalCodeDecoder ---

sage is stuck (infinite loop ?) when I run this:

sage: R = codes.RandomLinearCode(11, 5, GF(7))
sage: P = codes.PuncturedCode(R, [1,3])
sage: P.decoder()

l.444: you wrote 'random_values' instead of 'random-values'
in the constructor: you have a test variable error_erasure to which you assign 0 and 1. Maybe True and False are more explicit.
in decode_to_code: l.600, I think you forgot to increment shift (otherwise I misunderstood the use of shift)
maybe add an elif self._strategy == 'try-all' at line 643.
l.670: ValueError("The number of erasures exceed decoding capability") --> exceeds
l.672: decoding capability must be:

    return (diff - punctured -1) // 2

l.665; you could have negative radius with:

if self._strategy != 'try-all' and "error-erasure" not in D.decoder_type():
    return D.decoding_radius() - punctured

I don't understand the last return D.decoding_radius(), line 675. To which case does it apply ?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 69da167 to 659105c

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`050d00d`	`Merge branch 'develop' into punctured_code`
`31fbb4f`	`positions is now a set instead of a list`
`e3f8f6b`	`Reinstated huffman in index.rst`
`564760f`	`Rewrote _punctured_form in grs.py`
`0746385`	`Changed method _punctured_form in linear_code.py`
`a27ca48`	`Rewrote method puncture in punctured_code`
`b0decac`	`Changed docstring in structured_representation`
`d16f13a`	`Simplified generator_matrix computation in the encoder`
`21b3570`	`Added generic decoders as available decoders for punctured codes`
`659105c`	`Some tweaking to punctured_code.py`

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:23

Hello Julien,

Thanks for your thorough review!

I followed all your suggestions (good idea to change the set of points to a list!). Some remarks/answers follow:

As positions is now a set, the test of duplicate positions in the constructor is no longer needed.
Open question: now that I changed positions to a set, the str methods return the ugly str format of a set: set([3,4]) for {3, 4}. Which means the string representation of a punctured code now states PuncturedCode coming from ... punctured in position(s) set([pos]) which is quite ugly... So, do you think I should keep that, or, only for these str methods return a list while it might be a bit confusing for the user because he/she entered a set a got a list?

sage is stuck (infinite loop ?) when I run this:

Oh, yes, it's not an infinite loop! It's because it builds the original decoder of your code, namely a SyndromeDecoder for a linear code over GF(7), length 11, dimension 5... Which takes a while because of the table of syndromes which is heavy to build.

Best,

David

johanrosenkilde commented 8 years ago

comment:24

I'd prefer it if the user could enter either a set or a list of punctured positions, or a single position. The internal representation as a set probably makes sense though. I consider str and repr as just pretty-printing, so formatting as a (sorted) list make sense.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

cef203f Reinstated list as an allowed input type for punctured codes. Changed str methods for punctured codes.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 659105c to cef203f

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:26

Hello,

The user can now pass either an Integer, an int, a list or a set to PuncturedCode. The internal representation is a set. I added a few extra sanitization tests, so if one passes a list/set of anything but ints/Integers, an exception is raised.

I also reinstated the list representation of the punctured positions for str methods.

Remark:

for now, punctured_positions (the getter for positions) always return a set, even if one passed an int/Integer/list at construction time. I think it's fine like that, but if you think it can be confusing/annoying for the user, I can change its behaviour so it returns the exact data one passed at construction time.

This is still open for review.

Best,

David

johanrosenkilde commented 8 years ago

comment:27

Awesome!

for now, punctured_positions (the getter for positions) always return a set, even if one passed an int/Integer/list at construction time. I think it's fine like that, but if you think it can be confusing/annoying for the user, I can change its behaviour so it returns the exact data one passed at construction time.

I think this makes complete sense.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

2d13b55 Changed punctured method to support a list of vectors

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from cef203f to 2d13b55

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:29

Hello,

I fixed a bug: Punctured Code's decoder was not working if its original decoder was a list decoder. I changed puncture method (which is BTW now called _puncture as it's a helper function): it can now puncture a list of vectors or a vector. This takes care of the list-decoding issue.

Best,

David

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`771f3fe`	`decoder_type works now properly on uninstantiated classes`
`0d8ed86`	`Merge branch 'develop' into decoder_type_method_for_uninstanciated_classes`
`c2777ff`	`Merge branch 'decoder_type_method_for_uninstanciated_classes' into talk_secret`
`021b841`	`Merge branch 'punctured_code' into talk_secret`
`5bb39fe`	`Fixed bug related to sets and sorting in decode_to_code`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 2d13b55 to 5bb39fe

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

Changed branch from u/dlucas/punctured_codes to u/dlucas/punctured_code

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

Changed commit from 5bb39fe to none

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

`564760f`	`Rewrote _punctured_form in grs.py`
`0746385`	`Changed method _punctured_form in linear_code.py`
`a27ca48`	`Rewrote method puncture in punctured_code`
`b0decac`	`Changed docstring in structured_representation`
`d16f13a`	`Simplified generator_matrix computation in the encoder`
`21b3570`	`Added generic decoders as available decoders for punctured codes`
`659105c`	`Some tweaking to punctured_code.py`
`cef203f`	`Reinstated list as an allowed input type for punctured codes. Changed str methods for punctured codes.`
`2d13b55`	`Changed punctured method to support a list of vectors`
`36e71c5`	`Fixed bug related to sets and sorting in decode_to_code`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Commit: 36e71c5

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 8 years ago

comment:33

Sorry, I messed up with my git branches in the push before that. Anyway, my last commit fixes a bug related to decode_to_code: the loops I was using to insert elements in lists were failing because they were expecting to receive the positions to insert as a sorted list of positions. I wrote a helper functions to deal with this, and it now works fine. This is still open for review (and this time, I cannot find any more issues).

Best,

David

jlavauzelle commented 8 years ago

comment:34

Hi David,

A few remarks -- the last ones, I hope:

in structured_representation method: it doesn't work when one concatenates puncturings. For example, the following script

    C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4)
    P = codes.PuncturedCode(C, [0,6])
    D = P.structured_representation()
    P2 = codes.PuncturedCode(P, [0,4])
    D2 = P2.structured_representation()

fails at fifth line (so double puncturing actually works, only structured_representation crashes). From the crash log, it seems that you don't build the right set of pts.

in PuncturedCodePuncturedMatrixEncoder, one can construct this encoder while passing as argument a code which is not a punctured one. As a direct consequence, this scripts fails:

    C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4)
    E = codes.encoders.PuncturedCodePuncturedMatrixEncoder(C)
    G = E.generator_matrix()

The question is: is it the user's mistake, or should the code handle it ? Maybe it's reasonable to throw an error.

same remark for your decoder: one can pass a non-punctured code as parameter.
in __init__ function of PuncturedCodeOriginalCodeDecoder, maybe it is better to use True/False boolean values instead of error_erasure = 0 and error_erasure = 1.
I also prefer the compact notation:

    e_list = [ one if i in pts else zero for i in range(Cor.length()) ]

instead of:

    e_list = []
    for i in range(Cor.length()):
        if i in pts:
            e_list.append(one)
        else:
            e_list.append(zero)

finally, a typo, l.441 : default instead of dafault

I did not perform exhaustive tests of all the combinations of codes/encoders/decoders/strategies, because it is too long. So given the complexity of the feature you're trying to implement, it's still possible there remains some minor errors -- especially with extreme settings -- but maybe a real deep use of this class (by actual users) could make them appear more easily.

Thus, when you're done fixing the previous errors, I give the green light.

Julien

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 36e71c5 to 8a3a084

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`ddedfc3`	`Update to latest beta`
`f964812`	`Fixed bug in index.rst file`
`6afe4a9`	`Fixed bug related in structured_representation`
`890b309`	`Added input check for encoder and decoder`
`8a3a084`	`Tweaks: fixed typo, removed not Pythonic syntax`

sagemath / sage

A new structure for Punctured Codes #19422