Closed 1861b8a9-77f0-4f35-8431-8514a75b40d1 closed 9 years ago
Branch pushed to git repo; I updated commit sha1. New commits:
43ec640 | Added a check in unencode to raise an appropriate exception if a code of dimension 0 is used |
I added a check on the generic implementation of unencode
, which raises an explanatory exception if one tries to unencode a word of a code of dimension 0, instead of crashing poorly when trying to unencode. This is still pending for review.
Shouldn't a code of length 0 just be forbidden?
Also, what's wrong with unencoding a code of dimension 0? Sure, the code contains only the 0-word, but you should be able to unencode that.
There seems to be issues with other parts of linear code functionality for dimension 0, btw:
sage: M = Matrix(GF(2), 0, 5, [])
sage: C = LinearCode(M)
sage: C.random_element()
(0,0,0,0,0)
sage: C.list()
<BOOM>
I did that because as far as I tested it, in Sage a vector space of dimension 0 contains only the empty vector ()
which is I think not correct mathematically speaking (it should contain the zero vector).
So instead of wondering if I should return the zero of the message space or the empty vector, the former being the right mathematical answer, the latter the Sage answer, I chose to return an exception.
Note that before unencoding over a code of dimension 0 "returned'
Traceback (most recent call last):
...
TypeError: unable to find a common ring for all elements
I'm not strongly for this solution, so if you prefer to return either ()
or 0
I'm fine with that.
But returning 0, while being the expected mathematical answer, comes back to returning something which is not an element of the message space - according to the Sage representation of vector spaces of dimension 0.
I think Sage's answer is correct. Your example on bitbucket is a zero length (and dimension) vector space. The "zero vector" in that space is the empty vector.
Constructing one with only zero dimension correctly shows you the zero word:
sage: V = Matrix(GF(2), 0, 5).row_module()
sage: V.list()
[(0,0,0,0,0)]
I think that unencoding the zero vector on a zero dimensional code should return the empty vector. That seems the only sensible thing.
I guess the failure with C.list()
crashing (and possibly other stuff, like minimum distance) is another one for the pile of stuff to fix later on...
Oh, true enough. I'll make the change on this ticket.
Ok, things are looking pretty good. I had a look at your corrections after my last review.
Why is LinearCodeGeneratorMatrixEncoder
registered on AbstractLinearCode
and not LinearCode
?
encoder.py
: instanciated -> instantiated
codes.encoders
now contains lazy_import
. Should be hidden
AbstractLinearCode
class doc: __cmp
--> __cmp__
. Should this and eq
have :meth:
-tags on them?
AbstractLinearCode
class doc warning: perhaps "and F
its base ring" should be "and F
is the base field of the code".
encoder.encode
doc: the newly added "encode
is a partial function ..." should perhaps be made slightly less strong to not confuse readers: "encode
might be a partial function ...". Many encoders are going to be complete.
Best, Johan
Branch pushed to git repo; I updated commit sha1. New commits:
b89f014 | Unencoding from a code of dimension 0 now returns the empty vector |
Branch pushed to git repo; I updated commit sha1. New commits:
ed67ee2 | Integrated reviewer's comments |
Why is
LinearCodeGeneratorMatrixEncoder
registered onAbstractLinearCode
and notLinearCode
?
My mistake; fixed.
encoder.py
: instanciated -> instantiated
Fixed.
codes.encoders
now containslazy_import
. Should be hidden
Done.
AbstractLinearCode
class doc warning: perhaps "andF
its base ring" should be "andF
is the base field of the code".
Yes, it's less confusing like that. Done.
encoder.encode
doc: the newly added "encode
is a partial function ..." should perhaps be made slightly less strong to not confuse readers: "encode
might be a partial function ...". Many encoders are going to be complete.
That's right. Changed it.
Unencoding a zero-vector on a zero-dimensional code still explodes. You only catch the case where the ambient space has zero dimension as well (i.e. the code has length zero):
sage: G = Matrix(GF(2), 0, 5)
sage: C = LinearCode(G)
sage: c = C.random_element()
sage: C.unencode(c)
<BOOM>
I pushed an alternative fix to the zero-dimensional problem. I give the green light to the patch now. If you can accept my small amendment as is, set it to positive review :-)
Best, Johan
Reviewer: Johan Sebastian Rosenkilde Nielsen
Branch pushed to git repo; I updated commit sha1. New commits:
323b528 | Fixed bug in unencode |
Changed branch from u/dlucas/encoder to u/jsrn/encoder
Accepted.
Many thanks!
New commits:
09cff05 | Alternative fix for the zero-dimensional unencoding bug |
Changed branch from u/jsrn/encoder to 09cff05
For now, linear codes don't have a proper structure for encoding, "encoding" meaning handle the bijection between a message space of the code and its ambient space.
We propose in this ticket a new structure, designed to handle message -> codeword and codeword -> message transformations. The former operation is designated as encoding, the latter as unencoding.
We provide the following:
An abstract class
Encoder
, for inheritances purposes. Every Encoder will inherit from this class. It contains:encode
andunencode
methods. They come with a default implementations for encoders whose message space is a vectorial space;generator_matrix
method (see Design) andAn exception class
EncodingFailure
, for errors related to encoding or unencoding.Methods for encoding and unencoding handling in
AbstractLinearCode
(see Design)Design
Some codes families might have different message spaces. In the case of -for instance- Reed-Solomon codes, it is possible to transform elements of a Polynomial Ring into codewords, as well as elements of a vectorial space. We wanted to be able to support encoding over all possible message spaces, which is here represented by Encoder subclasses. Continuing the Reed-Solomon example, we will have a
PolynomialEncoder
and aVectorialEncoder
.Considering this particularity, we chose to let encoders deal with generator matrix computation, as it depends on the message space.
Of course, this multiplicity of objects and spaces might be confusing for people who just want to encode elements without being bothered by all these messages spaces. So, we created a set of methods to directly perform encoding operations on the code itself. We added a new field in
AbstractLinearCode
, calleddefault_encoder
. This field indicates to the program which encoder it should use if none is specified. For instance, with a codeC
and a messagem
, one can directly callC.encode(m)
. In that case, the default encoder will be used to perform the encoding operation. This works for every encoder-related method, likeunencode()
, orgenerator_matrix
.As there might be numerous encoders for a same code family, we chose to maintain a dictionary of known encoders for every code family. In this ticket, only one encoder is provided, called
LinearCodeGeneratorMatrixEncoder
. If one wants to create an instance of this encoder for a codeC
, he can of course callbut we propose something better. While calling the method
C.encoders_available()
, one will get a list of all known encoders forC
. For any linear code, he will get['GeneratorMatrix']
. After that, he can callwhich will also instantiate
LinearCodeGeneratorMatrixEncoder
forC
and cache the result as well. So, any further call (for instance,C.encode(m, 'GeneratorMatrix')
) to this encoder will be fast, and will guarantee that the sameLinearCodeGeneratorMatrixEncoder
will be used every time.The dictionary of available encoders for every code family is filled in
__init__.py
, as any instance of a subclass has to know some specific encoders. Furthermore, it is possible to add an encoder subclass for a specific instance of a code using the methodadd_encoder
.With this design, we are able to satisfy specific experimentation which requires specific encoders as well as generic experimentation for which any encoder will be fine.
Users will probably most often use and expect the message space
F^k
, wherek
is the dimension of the code, andF
the field. For this reason, the default encoder should be an encoder with this message space, such thatC.encode(m)
expectsm
fromF^k
andC.unencode(c)
returns an element ofF^k
.We also now have a default implementation of
generator_matrix
inAbstractLinearCode
which calls thegenerator_matrix
method of the selected encoder.CC: @johanrosenkilde @jpflori @videlec @defeo @sagetrac-danielaugot @ClementPernet
Component: coding theory
Author: David Lucas
Branch/Commit:
09cff05
Reviewer: Johan Sebastian Rosenkilde Nielsen
Issue created by migration from https://trac.sagemath.org/ticket/18376