New encoding structure for linear codes

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

For now, linear codes don't have a proper structure for encoding, "encoding" meaning handle the bijection between a message space of the code and its ambient space.

We propose in this ticket a new structure, designed to handle message -> codeword and codeword -> message transformations. The former operation is designated as encoding, the latter as unencoding.

We provide the following:

An abstract class Encoder, for inheritances purposes. Every Encoder will inherit from this class. It contains:
- a default constructor;
- both encode and unencode methods. They come with a default implementations for encoders whose message space is a vectorial space;
- several methods to get information on the instance of the class;
- generator_matrix method (see Design) and
- explanations on what to do to create a new encoder subclass.
An exception class EncodingFailure, for errors related to encoding or unencoding.
Methods for encoding and unencoding handling in AbstractLinearCode (see Design)

Design

Some codes families might have different message spaces. In the case of -for instance- Reed-Solomon codes, it is possible to transform elements of a Polynomial Ring into codewords, as well as elements of a vectorial space. We wanted to be able to support encoding over all possible message spaces, which is here represented by Encoder subclasses. Continuing the Reed-Solomon example, we will have a PolynomialEncoder and a VectorialEncoder.

Considering this particularity, we chose to let encoders deal with generator matrix computation, as it depends on the message space.

Of course, this multiplicity of objects and spaces might be confusing for people who just want to encode elements without being bothered by all these messages spaces. So, we created a set of methods to directly perform encoding operations on the code itself. We added a new field in AbstractLinearCode, called default_encoder. This field indicates to the program which encoder it should use if none is specified. For instance, with a code C and a message m, one can directly call C.encode(m). In that case, the default encoder will be used to perform the encoding operation. This works for every encoder-related method, like unencode(), or generator_matrix.

As there might be numerous encoders for a same code family, we chose to maintain a dictionary of known encoders for every code family. In this ticket, only one encoder is provided, called LinearCodeGeneratorMatrixEncoder. If one wants to create an instance of this encoder for a code C, he can of course call

E = LinearCodeGeneratorMatrixEncoder(C)

but we propose something better. While calling the method C.encoders_available(), one will get a list of all known encoders for C. For any linear code, he will get ['GeneratorMatrix']. After that, he can call

E = C.encoder('GeneratorMatrix')

which will also instantiate LinearCodeGeneratorMatrixEncoder for C and cache the result as well. So, any further call (for instance, C.encode(m, 'GeneratorMatrix')) to this encoder will be fast, and will guarantee that the same LinearCodeGeneratorMatrixEncoder will be used every time.

The dictionary of available encoders for every code family is filled in __init__.py, as any instance of a subclass has to know some specific encoders. Furthermore, it is possible to add an encoder subclass for a specific instance of a code using the method add_encoder.

With this design, we are able to satisfy specific experimentation which requires specific encoders as well as generic experimentation for which any encoder will be fine.

Users will probably most often use and expect the message space F^k, where k is the dimension of the code, and F the field. For this reason, the default encoder should be an encoder with this message space, such that C.encode(m) expects m from F^k and C.unencode(c) returns an element of F^k.

We also now have a default implementation of generator_matrix in AbstractLinearCode which calls the generator_matrix method of the selected encoder.

CC: @johanrosenkilde @jpflori @videlec @defeo @sagetrac-danielaugot @ClementPernet

Component: coding theory

Author: David Lucas

Branch/Commit: 09cff05

Reviewer: Johan Sebastian Rosenkilde Nielsen

Issue created by migration from https://trac.sagemath.org/ticket/18376

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

43ec640 Added a check in unencode to raise an appropriate exception if a code of dimension 0 is used

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:45

I added a check on the generic implementation of unencode, which raises an explanatory exception if one tries to unencode a word of a code of dimension 0, instead of crashing poorly when trying to unencode. This is still pending for review.

johanrosenkilde commented 9 years ago

comment:46

Shouldn't a code of length 0 just be forbidden?

Also, what's wrong with unencoding a code of dimension 0? Sure, the code contains only the 0-word, but you should be able to unencode that.

There seems to be issues with other parts of linear code functionality for dimension 0, btw:

    sage: M = Matrix(GF(2), 0, 5, [])
    sage: C = LinearCode(M)
    sage: C.random_element()
    (0,0,0,0,0)
    sage: C.list()
    <BOOM>

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:47

I did that because as far as I tested it, in Sage a vector space of dimension 0 contains only the empty vector () which is I think not correct mathematically speaking (it should contain the zero vector).

So instead of wondering if I should return the zero of the message space or the empty vector, the former being the right mathematical answer, the latter the Sage answer, I chose to return an exception.

Note that before unencoding over a code of dimension 0 "returned'

Traceback (most recent call last):
...
TypeError: unable to find a common ring for all elements

I'm not strongly for this solution, so if you prefer to return either () or 0 I'm fine with that. But returning 0, while being the expected mathematical answer, comes back to returning something which is not an element of the message space - according to the Sage representation of vector spaces of dimension 0.

johanrosenkilde commented 9 years ago

comment:48

I think Sage's answer is correct. Your example on bitbucket is a zero length (and dimension) vector space. The "zero vector" in that space is the empty vector.

Constructing one with only zero dimension correctly shows you the zero word:

   sage: V = Matrix(GF(2), 0, 5).row_module()
   sage: V.list()
   [(0,0,0,0,0)]

I think that unencoding the zero vector on a zero dimensional code should return the empty vector. That seems the only sensible thing.

I guess the failure with C.list() crashing (and possibly other stuff, like minimum distance) is another one for the pile of stuff to fix later on...

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:49

Oh, true enough. I'll make the change on this ticket.

johanrosenkilde commented 9 years ago

comment:50

Ok, things are looking pretty good. I had a look at your corrections after my last review.

Why is LinearCodeGeneratorMatrixEncoder registered on AbstractLinearCode and not LinearCode?
encoder.py: instanciated -> instantiated
codes.encoders now contains lazy_import. Should be hidden
AbstractLinearCode class doc: __cmp --> __cmp__. Should this and eq have :meth:-tags on them?
AbstractLinearCode class doc warning: perhaps "and F its base ring" should be "and F is the base field of the code".
encoder.encode doc: the newly added "encode is a partial function ..." should perhaps be made slightly less strong to not confuse readers: "encode might be a partial function ...". Many encoders are going to be complete.

Best, Johan

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b89f014 Unencoding from a code of dimension 0 now returns the empty vector

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 43ec640 to b89f014

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from b89f014 to ed67ee2

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

ed67ee2 Integrated reviewer's comments

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:53

Why is LinearCodeGeneratorMatrixEncoder registered on AbstractLinearCode and not LinearCode?

My mistake; fixed.

encoder.py: instanciated -> instantiated

Fixed.

codes.encoders now contains lazy_import. Should be hidden

Done.

AbstractLinearCode class doc warning: perhaps "and F its base ring" should be "and F is the base field of the code".

Yes, it's less confusing like that. Done.

encoder.encode doc: the newly added "encode is a partial function ..." should perhaps be made slightly less strong to not confuse readers: "encode might be a partial function ...". Many encoders are going to be complete.

That's right. Changed it.

johanrosenkilde commented 9 years ago

comment:54

Unencoding a zero-vector on a zero-dimensional code still explodes. You only catch the case where the ambient space has zero dimension as well (i.e. the code has length zero):

sage: G = Matrix(GF(2), 0, 5)
sage: C = LinearCode(G)
sage: c = C.random_element()
sage: C.unencode(c)
<BOOM>

johanrosenkilde commented 9 years ago

comment:55

I pushed an alternative fix to the zero-dimensional problem. I give the green light to the patch now. If you can accept my small amendment as is, set it to positive review :-)

Best, Johan

johanrosenkilde commented 9 years ago

Reviewer: Johan Sebastian Rosenkilde Nielsen

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

323b528 Fixed bug in unencode

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from ed67ee2 to 323b528

johanrosenkilde commented 9 years ago

Changed branch from u/dlucas/encoder to u/jsrn/encoder

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:58

Accepted.

Many thanks!

New commits:

09cff05 Alternative fix for the zero-dimensional unencoding bug

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

Changed commit from 323b528 to 09cff05

vbraun commented 9 years ago

Changed branch from u/jsrn/encoder to 09cff05

sagemath / sage

New encoding structure for linear codes #18376

Design