Closed 1861b8a9-77f0-4f35-8431-8514a75b40d1 closed 9 years ago
Branch: u/dlucas/encoder
Commit: bad9715
Last 10 new commits:
7048dbb | New encoder-related methods in linear_code.py |
15cd2cd | New encoder for linear codes |
b54bac5 | Links with encoder structure |
5f4ca77 | Default implementation of generator_matrix in AbstractLinearCode. Changed documentation |
c6418c9 | Changes to the documentation |
1a0a5a4 | Add new method for adding encoders to the list of available encoders on the fly. Changed encoders_available method |
c82673a | Changes to the documentation |
e9d31cb | Fixed conflict after merge with 6.7.4 |
9b7ac0b | Default equality check method in Encoder. Changed naming convention for encoders. Removed linear_code_encoders file. Updated documentation |
bad9715 | Hid encoders under codes.encoders. |
Author: David Lucas
Description changed:
---
+++
@@ -1,4 +1,4 @@
-For now, linear codes don't have a proper structure for encoding, "encoding" meaning handle the bijection between the message space of the code and its ambient space.
+For now, linear codes don't have a proper structure for encoding, "encoding" meaning handle the bijection between a message space of the code and its ambient space.
We propose in this ticket a new structure, designed to handle message -> codeword and codeword -> message transformations. The former operation is designated as encoding, the latter as unencoding.
@@ -46,5 +46,10 @@
With this design, we are able to satisfy specific experimentation which requires specific encoders as well as generic experimentation for which any encoder will be fine.
-Note that, as it sounds more intuitive to manipulate vectors, `C.unencode()` will always return an element of a vectorial space.
+Users will probably most often use and expect the message space `F^k`,
+where `k` is the dimension of the code, and `F` the field. For this reason,
+the default encoder should be an encoder with this message space, such
+that `C.encode(m)` expects `m` from `F^k` and `C.unencode(c)` returns an
+element of `F^k`.
+
We also now have a default implementation of `generator_matrix` in `AbstractLinearCode` which calls the `generator_matrix` method of the selected encoder.
Branch pushed to git repo; I updated commit sha1. New commits:
3715880 | Fix related to encoders_catalog file |
Branch pushed to git repo; I updated commit sha1. New commits:
f447b9a | Updated documentation in encoder |
I noticed a few things that could be enhanced while rereading my code, and made the changes.
This ticket is still pending for review, and not a work in progress :)
The term "unencoding" is absolutely logical. But assuming it is not yet settled in the literature, I propose to consider "lift"as an alternative to mean the same thing. It is short and friendly (and mathematish :-)
Hello,
I think the term "lift" is not that good to describe such an operation. The operation we're talking about here consists in taking an element of the "bigger space" (the ambient space) to the "smaller space" (the message space), while "lift" usually means going from the small space to the big one.
Furthermore, I think it's not really intuitive when it comes to discovery by sage: C.<tab>
or sage: E.<tab>
(with C
a code and E
an encoder) and sees the word unencode
, he understands immediately (imho) the meaning of it, while the term "lift" does not seem to be that straightforward.
I also find it a problem not to have a proper name in the literature for reverting to the message from a codeword. Unencoding is definitely not nice, but it is very clear, where lift has a very vague meaning, which varies a lot depending on the setting.
And suppose furthermore one day we want to deal with codes over Z4, than it would be very proper to call "lift" the fonction which takes a F2 codeword c2 and propose a Z4 codeword c4 which reduces to c2 mod 2.
Go for "unencode". There seems to be no better alternative.
Helloooooo,
Here are my remarks/questions while I was looking for the first time at your code. Let's get this thing in so that you are not blocked for the more mathematical parts.
Vectorial space -> vector space?
default implementations of encode()
and unencode_nocheck()
methods
call Encoder‘s __init__()
in --> Encoder.__init__
?
As Encoder is not designed to be implemented -> instanciated?
Instead of typing the name of a method/class/function inside backticks, you
should link to it, i.e. replace:
Returns the code in which ``self.encode()`` has its output.
with
Returns the code in which :meth:`encode` has its output.
You can then check by compiling the documentation with --warn-links
that
everything behaves as expected.
(happens in many places)
Transforms an element of the message space into an element of the code --> into a codeword?
You have a unencode()
method with a check
argument on one hand, and a
unencode_nocheck
method on the other hand. It makes me think that the two
should be merged somehow, but I do not know exactly how they differ.
What about rewriting the following message by creating a bibliography entry
for codinglib?
{{{This function is taken from codinglib
(https://bitbucket.org/jsrn/codinglib/) and was written by Johan Nielsen.}}}
It would become
This function is taken from codinglib [Nielsen]_
and link toward an entry where both the url and the authors are given.
Documentation of unencoder_matrix
:
This is a helper function, for internal use only.
If it is not meant for public use, can't we rename it to _unencoder_matrix
?
EncodingFailure
-> Given that most exceptions end with Error
, what about
EncodingError
?
Index of encoders -> It is a bit odd that you advertise codes.encoders
as
the way to get their list, while the entry that follows starts with
linear_code.
To import these names into the global namespace, use:
--> it sems that the
code is already in the global namespace?...
encoder_default_name
-> default_encoder
?
if(name in reg_enc.keys())
--> creates a temporary and useless list,
i.e. the list of keys. if name in reg_enc
works.
(honest question) what is the point of letting the users add encoders with
this add_encoder
method? If a user implements a new encoder, what does he
earn by linking it inside of the instance instead of using it independenly?
If there is such an advantage, could you illustrate it with an example in the
function's docstring?
When you accept **kwargs
as input, please say that all additional arguments
are accepted and forwarded to function def encode(self, word, name=None, **kwargs):
In the same encode
function, I find the variable name
a bit vague. What
about encoder_name
or encoder
?
The same happens in: encoder()
, generator_matrix
, unencode
.
Instead of encoders_available
, what about simply encoders
? I do not see
the added value of _available
there. Furthermore, what about returning a
copy of the inner dictionary? Given that you can use a dictionary as you would
use a list (list(my_dict)
, for x in my_dict
, if x in my_dict
) why
wouldn't we return everything at once?
information_set
takes nothing as argument but calls generator_matrix
which
can depend on an encoder. Could you expose that flag and document it? That's
the price of exposing in LinearCode
the methods which are defined only in
its encoder object.. It is a bit awkward :-/
Really, these methods encode
and unencode
have nothing to do in
LinearCode
. You should "accept your own design" and have all these things be
available only in the Encoder
object, otherwise there will be a crazy amount
of copy/paste with those arguments.
What about simplifying
The only purpose of this encoder is to set generic linear codes
into the new Encoder structure by providing a valid ``generator_matrix``
method.
As
This is the default encoder of a generic linear code
?
About:
Its ``generator_matrix`` method returns private field ``_generator_matrix``
of its associated code if any, else it calls the ``generator_matrix`` method
of the default encoder of the associated code.
Isn't that "over-documenting"? One can obain this kind of information by reading the code (e.g. by adding ?? at the end of the function-->easy), and it is very unlikely that we will think of updating it if it ever changes.
How is this
if hasattr(self.code(), "_generator_matrix"):
return self.code()._generator_matrix
else:
return self.code().generator_matrix()
better than return self.code().generator_matrix()
?
About:
class Encoder(SageObject):
...
This class provides:
- some methods for encoder objects
Err. Well, I could have guessed :-P
Why wouldn't you say "nothing there"? This kind of information can be obtained
by tab completion or by browsing the html doc.
If the following is meant for developers, I think we do not need it
(especially since the patches will either be written or reviewed by you
:-P
). As for users, I can't see what they are expected to make of it. Surely
they can call their classes as they like?
.. NOTE::
For consistency on encoders, please follow this convention on names for subclasses:
for a new encoder named ``EncName``, for code family ``CodeFam``, call it
``CodeFamEncNameEncoder``.
If this code
must be an instance of a specific class, could you make it
explicit? It may be as simple as turning the second occurrence of 'code' into
a Sphinx link toward a class.
- ``code`` -- the associated code of ``self``
I do not understand why the Encoder
class provides methods to encode/decode
linear codes from the matrix, given that you cannot be sure that a matrix is
available. Shouldn't this be in LinearCodeGeneratorMatrixEncoder
?
About having this 'name' parameter everywhere. What would be the problem with
some set_encoder
function which would define the instance's default encoder,
and have all other functions call that? This could be defined from a string
(using the list of available encoders) or by giving a class directly, which
would make a public 'add_encoder' function useless, as users would have an
easy way to plug their own encoder into the instance.
I am now in Nantes. It took me a Nice->Nantes flight to work on this review. I love this city. Plus the weather is good but not as hot as in the south. Cooooooool.
Have fuuuuuuuuuuuun,
Nathann
Hello!
Here are my remarks/questions while I was looking for the first time at your code. Let's get this thing in so that you are not blocked for the more mathematical parts.
Thanks a lot for this, and for being that thorough! It's exactly the kind of comments we were hoping for.
I fixed the typos and small mistakes you noticed. Here are some answers to your questions:
To import these names into the global namespace, use: --> it sems that the code is already in the >global namespace?...
It shouldn't be available in global namespace. I fixed that.
encoder_default_name
->default_encoder
?
I kept encoder_default_name
as it's the name of the encoder (the string that points to it), and not the object in itself.
How is this
if hasattr(self.code(), "_generator_matrix"):
return self.code()._generator_matrix
else:
return self.code().generator_matrix()
better than return
self.code().generator_matrix()
?
This encoder is designed to put generic linear codes (LinearCode
class) into the new framework. And in LinearCode
, you have a class parameter _generator_matrix
because one has to provide a generator matrix when he build a "structureless" linear code.
As generator_matrix()
method in AbstractLinearCode
returns the generator matrix given by the default encoder, which is, for these LinearCode
s the one you pointed out, if this encoder had this self.code().generator_matrix()
in its generator_matrix()
method, it will lead to an infinite loop when calling LinearCode.generator_matrix()
.
information_set
takes nothing as argument but callsgenerator_matrix
which can depend on an encoder. Could you expose that flag and document it? That's the price of exposing inLinearCode
the methods which are defined only in its encoder object.. It is a bit awkward:-/
Actually, the information set can be computed from any generator matrix, and it will always return the same result whatever the generator matrix I used to compute it. As the default encoder of a code has a valid generator_matrix
method, and as C.generator_matrix()
calls the default encoder, the code of information_set()
will always return a valid result.
Really, these methods
encode
andunencode
have nothing to do inLinearCode
. You should "accept your own design" and have all these things be available only in theEncoder
object, otherwise there will be a crazy amount of copy/paste with those arguments.
I do think they should be there. A lot of users won't care about the kind of encoding/decoding which will be used underneath as long as they get what they expect. For these users, being able to directly call MyCode.encode(m)
or MyCode.unencode(c)
instead of having to ask for the list of available encoders, pick the one they want, create it, and call encode
or unencode
on it is a really helpful feature.
I do not understand why the
Encoder
class provides methods to encode/decode linear codes from the matrix, given that you cannot be sure that a matrix is available. Shouldn't this be inLinearCodeGeneratorMatrixEncoder
?
It actually provides a default implementation that will avoid spending time (and repeating code) in case of you have a generator matrix for your code. If you know a generator matrix for the encoder you're writing, you just have to implement generator_matrix
and you get immediately encode
and unencode
.
If you don't have one, you have to override these methods in your encoder. It's just a way to make developer's life easier :)
Instead of
encoders_available
, what about simply encoders? I do not see the added value of_available
there. Furthermore, what about returning a copy of the inner dictionary? Given that you can use a dictionary as you would use a list(list(my_dict)
,for x in my_dict
,if x in my_dict)
why wouldn't we return everything at once?
In AbstractLinearCode
, we already provide a method called encode
and another called encoder
. I think adding a third one called encoders
might lead to some confusion amongst the users, especially as encoder
returns an Encoder
object, one might think encoders
will return a list of all possible Encoder
s objects available for the provided code. Plus, I think the _available
actually adds value, by meaning "this is the list of all encoders you can access with your linear code".
About the dictionary, if I just want to build a specific encoder by using C.encoder(name='mySpecificName
), all I care about is actually the list of names of all the encoders my C
can access to. I think returning the whole dictionary might be a bit confusing, as one will recover the objects as well as the string names representing them, but he only needs the string names to build them (thanks to the encoder
method).
Now, about your idea of having a set_encoder
method. There (imho) several problems with that, namely:
the existence of _default_encoder_name
as a class argument allow all the methods of LinearCode
that rely on a generator matrix to work. Indeed, as they call generator_matrix()
without any argument in their bodies, and as we always set an encoder which knows a generator matrix as a default encoder, all these methods will always work. Now, if you allow the user to dynamically change this, and if he changes the encoder to another which does not has a generator matrix, all these methods will fail... Which is a bit silly, because what we only care about when the message space is not a vector space are the differences when it comes to encoding/unencoding, we don't want that to change the behaviour of other methods.
also, it will lead to the deletion of these "name" arguments, as you said. So if I want to test different encoding styles (or later on, decoding algorithms) I will have to reset a new encoder everytime. Of course it's only an extra line of code everytime, and of course encoders are cached so it's not heavy (memory-wise), but it still adds a bit of heaviness in the process.
So I'd prefer to keep it as is. And in that case, I also vote to keep add_encoder
method.
I am now in Nantes. It took me a Nice->Nantes flight to work on this review. I love this city. Plus the weather is good but not as hot as in the south. Cooooooool.
Sounds nice, enjoy your vacations!
David
Hellooooooo again,
It seems that your branch has become uncompatible with the latest beta. Turns
out that it is my fault: #18926 ^^;
encoder_default_name
->default_encoder
?I kept
encoder_default_name
as it's the name of the encoder (the string that points to it), and not the object in itself.
I see. Would it be possible to pick 'default_encoder_name' then? It sounds "more english" to say "the default encoder's name". Otherwise it seems that you are talking of the default name of an encoder.
This encoder is designed to put generic linear codes (
LinearCode
class) into the new framework. And inLinearCode
, you have a class parameter_generator_matrix
because one has to provide a generator matrix when he build a "structureless" linear code. Asgenerator_matrix()
method inAbstractLinearCode
returns the generator matrix given by the default encoder, which is, for theseLinearCode
s the one you pointed out, if this encoder had thisself.code().generator_matrix()
in itsgenerator_matrix()
method, it will lead to an infinite loop when callingLinearCode.generator_matrix()
.
I do not think that I follow this hierarchy of classes and calls between them as
well as you do. My point here is that the code tries to meddle with the private
attributes of another class, and I thought that it could be easily avoided. I
can understand that there would be an infinite loop somewhere if you only did
the replacement I proposed, so what about moving this cache check to the
.generator_matrix
method of the code()
object? In this situation, you would
be able to perform the replacement I proposed, and the only method which would
try to detect private attributes would do so in its own class which is a bit
'cleaner'.
Actually, the information set can be computed from any generator matrix, and it will always return the same result whatever the generator matrix I used to compute it. As the default encoder of a code has a valid
generator_matrix
method, and asC.generator_matrix()
calls the default encoder, the code ofinformation_set()
will always return a valid result.
Agreed for validity, now what about speed? I agree to let it lie, but having these things appear is a sign that those methods are exposed too high, especially when you want to handle several encoders at once.
Really, these methods
encode
andunencode
have nothing to do inLinearCode
. You should "accept your own design" and have all these things be available only in theEncoder
object, otherwise there will be a crazy amount of copy/paste with those arguments.I do think they should be there. A lot of users won't care about the kind of encoding/decoding which will be used underneath as long as they get what they expect.
That's hardly a problem: they can do encoder = my_code.encoder()
and then play
with the encoder. They can even do my_code.encoder().encode()
as this thing is
cached.
For these users, being able to directly call
MyCode.encode(m)
orMyCode.unencode(c)
instead of having to ask for the list of available encoders, pick the one they want, create it, and callencode
orunencode
on it is a really helpful feature.
That's not true, as the example above shows. They do not have to go through the list, or understand it, or anything else. Your design here is leading you to copy the functions of an encoder to a higher-level class, and it does not sound at all like a good idea from the programming point of view. From the user's point of view this is also debattable, for having users call 'encoder()' makes them aware that several encoders are available.
You designed your code to have Encoder objects, and quite naturally that's where
the encode/unencode
methods live. If you distinguish a code and its encoder,
you should accept the consequences. And they are not so bad on the user's side,
for (s)he can create the encoder without making any choice, i.e. by calling
.encoder()
.
I do not understand why the
Encoder
class provides methods to encode/decode linear codes from the matrix, given that you cannot be sure that a matrix is available. Shouldn't this be inLinearCodeGeneratorMatrixEncoder
?It actually provides a default implementation that will avoid spending time (and repeating code) in case of you have a generator matrix for your code. If you know a generator matrix for the encoder you're writing, you just have to implement
generator_matrix
and you get immediatelyencode
andunencode
.If you don't have one, you have to override these methods in your encoder. It's just a way to make developer's life easier
:)
I did not mean that the code should be replaced, but that it should be moved somewhere else. You define methods which assume more on the object than what it should (i.e. that the matrix is available). If it is instanciated without that matrix these functions will break.
What i suggest is to move them to a class (inheriting from Encoder) whose name makes it clear that the generator matrix should be available. In this situation, anybody creating a class that inherits from that other class will do it in full knowledge that a matrix has to be available, and does not run the risk of inheriting broken methods.
You can pick any name you like for that other class, of course.
In
AbstractLinearCode
, we already provide a method calledencode
and another calledencoder
. I think adding a third one calledencoders
might lead to some confusion amongst the users, especially asencoder
returns anEncoder
object, one might thinkencoders
will return a list of all possibleEncoder
s objects available for the provided code. Plus, I think the_available
actually adds value, by meaning "this is the list of all encoders you can access with your linear code".
I disagree with the implementation choice and the explanations, but I am ready to classify it under "matters of taste", in which case I don't have any reason to force mine, given that you are the one who writes the code.
About the dictionary, if I just want to build a specific encoder by using
C.encoder(name='mySpecificName
), all I care about is actually the list of names of all the encoders myC
can access to. I think returning the whole dictionary might be a bit confusing,
Okay, accepted under 'confusing' when the user asks for the names only, and cannot be assumed to understand more technicality.
When the users explicitly asks for both names and classes, however, I see little
interest in returning the list of .items()
instead of the dictionary. Would
you have anything against returning a copy of the dict in this case?
Now, about your idea of having a
set_encoder
method. There (imho) several problems with that, namely:
- the existence of
_default_encoder_name
as a class argument allow all the methods ofLinearCode
that rely on a generator matrix to work. Indeed, as they callgenerator_matrix()
without any argument in their bodies, and as we always set an encoder which knows a generator matrix as a default encoder, all these methods will always work. Now, if you allow the user to dynamically change this, and if he changes the encoder to another which does not has a generator matrix, all these methods will fail... Which is a bit silly, because what we only care about when the message space is not a vector space are the differences when it comes to encoding/unencoding, we don't want that to change the behaviour of other methods.
You convinced me that some encoders have features that others do not have, and that for this reason it made no sense to have a 'main' encoder in the class.
- also, it will lead to the deletion of these "name" arguments, as you said. So if I want to test different encoding styles (or later on, decoding algorithms) I will have to reset a new encoder everytime.
Not true. You would create several encoders, and use them directly.
Have fun,
Nathann
I'm going to jump in on the discussion about whether to have my_code.encode()
, my_code.unencode()
and my_code.generator_matrix()
or not. I've reflected on what you wrote Nathann, and though I'm not impossibly against removing these functions, I still argue that they are better retained.
The difference is really a matter of style and taste, and the opinion on what we're optimising our interface for. These functions are "end-user" functions, which can be compared to a top-level API for e.g. a library design. Also there is it rather common to "forward" functions to make common use cases slightly more obvious and convenient to the library user (for instance, writing "randint(5)" instead of "Random().randInt(5)").
I agree that the difference between writing my_code.encode(m)
and my_code.encoder().encode(m)
is not huge, but the latter exposes more clearly the computer science way of implementing coding theory, where the former more directly supports a "naive" view of things.
Coding theory is often taught in a way that codes and generator matrices are very connected. Even many coding theory researchers would not make it a primary concern to support multiple encoders and generator matrices for a given code (e.g.: none of previours-Sage, GAP or Magma has this). Forcing these users to write something like my_encode.encoder().encode(m)
shoves the view of multiple encoders down their throat, whatever they want it or not. I could imagine that it would add a "eew"-factor to their experience. For students in a coding theory class, it would add a little extra distance between their classroom experience and the software experience.
These are not kill-all arguments. So if the added overhead of having and maintaining these shortcuts was large, I would vote against having them. However, we are talking about three very small functions (and two more for decoders in a later ticket), and no maintenance effort that I can see.
Johan
Hello,
I do not think that there is something inherently wrong in the way you expect user to call your code. To me, there is something wrong in the design of classes, and the way it is implemented. I tried to show that 'accepting the class design' would lead you to a different user interface.
For a while, I tried to forget what you wanted to represent (easier for me than for you), and it appeared to me that you were implementing class inheritance backwards.
Look at it, it is quite surprising: here is the list of methods of an encoder object I built:
sage: e.
e.category e.db e.dumps e.generator_matrix e.parent e.reset_name e.unencode e.version
e.code e.dump e.encode e.message_space e.rename e.save e.unencode_nocheck
If you remove the useless ones, you are left with those:
e.code() # returns the 'parent' code()
e.encode() # =e.code().encode()
e.generator_matrix() # =e.code().generator_matrix()
e.message_space() # =e.code().ambiant_space()
e.unencode() # =e.code().unencode()
It is funny to notice two things: 1) All encoder methods can be called directly from the code (i.e. at the moment there is no point to build the encoder object) 2) Some code functions call the encoder's function, which leads you to create one-line functions with 30 lines of doc each time.
If I just look at the code, it seems that you just want your code to inherit the methods implemented in the encoder, i.e. generator_matrix, encode, and unencode. Then you get this for free.
If you find a way to make all these functions available at the level of the code without having to copy/paste a lot of things, it would solve my main (and perhaps only) objection about the classes' design.
Nathann
Hi Nathann,
I don't think I'm following your criticism. I'll try to comment on your points.
For a while, I tried to forget what you wanted to represent (easier for me than for you), and it appeared to me that you were implementing class inheritance backwards.
Yes, we are "forwarding" certain functions from (multiple) encoder classes in one class AbstractLinearCode
. The many classes that will inherit from AbstractLinearCode
will all inherit this forwarding. There is a one-shot overhead in the implementation of AbstractLinearCode
(in this ticket) -- that's all.
Look at it, it is quite surprising: here is the list of methods of an encoder object I built: ... It is funny to notice two things: 1) All encoder methods can be called directly from the code (i.e. at the moment there is no point to build the encoder object)
Yes, that's a feature: the encoder objects will be created behind the scenes and the user will access them transparently through the code object.
The "there is no point to build the encoder object" is unclear to me. In the proposed ticket, the encoder object must be build behind the scenes, or the code functions would not work. Note that there is an (at least) one-to-many relation between code classes and encoders. This ticket is about supporting that multiplicity of encoders in an elegant fashion; elegance seen from the user perspective and future code class designers.
The proposed ticket puts as much of the boiler-plate as possible into AbstractLinearCode
, e.g. for selecting the right encoder, caching the encoder (including whatever cached objects it has), etc. Furthermore it puts in implementations of encode
, unencode
and generator_matrix
whose usability purposes we have already discussed. Note again that these implementations will never be touched by sub-classes but will "just work".
2) Some code functions call the encoder's function, which leads you to create one-line functions with 30 lines of doc each time.
"Each time" means this ticket. No more. All later codes will inherit this forwarding and will never touch encode
, unencode
and generator_matrix
.
If I just look at the code, it seems that you just want your code to inherit the methods implemented in the encoder, i.e. generator_matrix, encode, and unencode. Then you get this for free.
Are you suggesting killing off Encoder
entirely? I think that impoverishes the design a lot. That encoders are instances of classes means that they have lots of power, e.g. caching and intro-spection. If that's not what you're talking about, then I don't understand.
If you find a way to make all these functions available at the level of the code without having to copy/paste a lot of things, it would solve my main (and perhaps only) objection about the classes' design.
Good, because that's how the current ticket is from the point of view of new code classes and encoders :-) They will just inherit from AbstractLinearCode
or Encoder
respectively, and these "copy/paste" functions will be inherited.
Best, Johan
Hello guys,
I do not think that I can bring anything new to this thread, and I do not see us going anywhere. I tried to answer several times, I did not have anything smart nor enlightening to add.
I cannot say that I agree with that code either, and there is no point in blocking the ticket until we drive ourselves mad over it.
If somebody else comes here and wants to give it a positive review I will not object. Same if you review it between yourselves. But I will not. What I could contribute was my view on the matter, and I did that already.
Have fun guys,
Nathann
Hi Nathann,
I do not think that I can bring anything new to this thread, and I do not see us going anywhere. I tried to answer several times, I did not have anything smart nor enlightening to add.
I completely disagree: you pointed out several things, and the main discussion here has been interesting. David, me and our colleague Julien spent about an hour rediscussing the code with your point of view as alternative. We just concluded that we preferred the proposed model :-)
I cannot say that I agree with that code either, and there is no point in blocking the ticket until we drive ourselves mad over it.
If somebody else comes here and wants to give it a positive review I will not object. Same if you review it between yourselves. But I will not. What I could contribute was my view on the matter, and I did that already.
Very fair, thanks. In any case, if/how the ticket gets reviewed, after whatever possible changes, the code will never be set in stone and we can change it if it sucks :-)
Thanks a lot for your input, Nathann! And have fuuuuuuuuuun in Nantes.
Johan
Branch pushed to git repo; I updated commit sha1. New commits:
d132d3c | Update to 6.9beta4 |
Hi,
1st step of review: looking at what the patchbot complains about:
There is a function in sage.coding.encoder
without a test. Is it the abstract generator_matrix
that's causing it?
The mystic startup_modules
fails. This turns out to mean that the patchbot has detected that a new module is loaded at Sage startup: a thing that is nowadays attempted to keep a lid on by replacing normal imports with lazy_import
. The new module is sage.coding.encoder
. I'm pretty sure that this is due to line 3660 of linear_code
which imports encoder
. It doesn't seem possible to remove this import or change it to lazy_import
since Encoder
is referred just below. But what if linear_code
itself was lazily imported in sage.coding.all
? Some care possibly needs to be taken with the registered_encoders
line in __init__
. Could this line be put somewhere to activate it only when the lazy_import
of linear_code
gets dereferenced?
Best, Johan
- There is a function in
sage.coding.encoder
without a test. Is it the abstractgenerator_matrix
that's causing it?
Yes it is:
$ ./sage -coverage src/sage/coding/encoder.py
------------------------------------------------------------------------
SCORE src/sage/coding/encoder.py: 87.5% (7 of 8)
Missing doctests:
* line 268: def generator_matrix(self)
------------------------------------------------------------------------
How does one test an abstract method?
How does one test an abstract method?
Looking at other uses of abstract_method
, it seems that a specific example is used in the doctest. So you could use an example with LinearCodeGeneratorMatrixEncoder
.
Best, Johan
Ok, some comments, small and slightly less small:
class Encoder
says "... needed to describe properly the code defined in the subclass". "code" should be "encoder". But probably that sentence confuses more than it helps in any case :-)The paragraph "Equality methods ..." is a bit strange. Perhaps
By default, comparison of Encoder
s (the methods__eq__
and __ne__
)
are by memory reference: If you build the same encoder twice, they will be
different. If you need something more clever, override __eq__
and
__ne__
in you subclass.
This paragraph should also be in the list. The last paragraph on representation should also be in the list.
F^k
, you also need to override message_space
.Encoder.encode
, INPUT block "the code" should be
self.
OUTPUTblock.
selfshould be
self.code`.Encoder.encode
. The ValueError
should not refer to "vector", I think, to set the precedent of this error message for other encoders. Say The value to encode
instead, for instance. Also remove the a
just before M
in that error.Encoder.unencode
could perhaps be more precise. Perhaps instead "The inverse of encode
: return the message corresponding to the codeword c
." Also the INPUT
block seem to have been wrongly copy/pasted from a time when this was in LinearCode
. Also, last sentence in describing nocheck
could perhaps more enlightening be "You might set this to True
to disable the check for saving computation. Note that if c
is not in self
and nocheck = True
, then the output of unencode
is not defined (except that it will be in the message space of self
).Encoder.unencode
. You don't need all the else
stuff: remove the first else branch completely, and always run the second return self.unencode_nocheck(c)
.Encoder.unencode_nocheck
. Perhaps the information set corresponding to _unencoder_matrix
should be saved together with that matrix. Currently, the code -- rather brittly -- depends on self.code.information_set
always returning the same result. Which is not part of that method's docstring (even though you now added @cached_method
on its default implementation, which should still be there).Encoder.unencode_nocheck
, docstring. I like that you added an example for a vector not in C
. But perhaps you could reformulate that sentence to say something like "Taking a vector that does not belong to C will not raise an error, but probably just give a non-sensical result".Encoder.code
docstring. Perhaps just "Returns the code for this Encoder
"? Also, in the Example, last line could be E.code() == C
instead.Encoder.encode
docstring: after reading the docstring of Encoder.message_space
, I think that perhaps a reference to the possible partial function nature of Encoder.encode
should be given there towards the end, as an information to future Encoder
developers? Perhaps say what error should be raised in that case.encoders
import statement in coding/codes_catalog
should also be lazy? I know that codes_catalog
is already lazy, but encoders
will usually not be used, even if you use codes_catalog
.class AbstractLinearCode
docstring: I don't understand the point of This class provides: ...
. Why is length
there? There is a private property _length
and a method length()
. But for instance, dimension()
is not mentioned? Why do you list default_encoder_name
and _registered_encoders
there? Perhaps AbstractLinearCode
: the error thrown when giving an unknown encoder could perhaps hint at how to add this default encoder.AbstractLinearCode
docline "fill the dictionary..." should add .py
to the name of __init__
.AbstractLinearCode
docstring on __cmp__
and __eq__
could perhaps be clearer. Something like "AbstractLinearCode has generic implementations of the comparison methods __cmp__
and __eq__
which use the generator matrix and are quite slow. In subclasses you are encouraged to override these functions." Is __cmp__
and __eq__
the exhaustive list of comparison functions?AbstractLinearCode.add_encoder
. Perhaps the doc should clarify that the added encoder is only added to self
and not any member of the class. As well as point the reader to how to do that instead.AbstractLinearCode.add_encoder
. I don't like that all newly created codes copy the dictionary of registered encoders, when almost always, add_encoder
is not going to be called on a created code. Instead, I think that in add_encoder
you can add a check to see if self._registered_encoders is self.__class__._registered_encoders
or something. If this is the case, then you make a new copy of self._registered_encoders
and add the new encoder to it. Otherwise, add_encoder
must have been already called on this code and you shouldn't make a new copy of course.AbstractLinearCode.encode
doc is missing punctuation many places. Also in encoder
. Maybe also elsewhere.AbstractLinearCode.encode
docstring says "the message space". Should be "a message space". It should say prominently in the doc of both encode
and encoder
that the default encoder always has F^k
as message space. Come to think of it, that should also be stated very clearly in the doc of class AbstractLinearCode
.AbstractLinearCode.encoder
docstring refers to name
but its encoder_name
. The doc should cleary describe that the encoder is cached (right now it's only indirectly referred to in the last pargraph). I think there should be a test of this.AbstractLinearCode.encoder
the line return self.encoder(encoder_name, **kwargs)
can simply be removed.AbstractLinearCode.encoders_available
the variable reg_enc
is superfluous and not even used, two lines below. Perhaps the argument values
could be renamed to classes
.AbstractLinearCode.encode
and .generator_matrix
and .unencode
doc of kwargs
and LinearCode.generator_matrix
. Would it be more helpful to write "all additional arguments are forwarded to the construction of the encoder that is used."AbstractLinearCode.unencode
doc could be made more precise, like the Encoder.unencode
doc. The doc of input c
should just say that c
is a codeword of C
. nocheck
should also be clarified like in Encoder
. The OUTPUT block is uninformative and plain wrong (if encoder_name
is set to something).LinearCode.generator_matrix
. Why does this check that _generator_matrix
exists? That is always the case? More importantly, if encoder_name
is not None
or GeneratorMatrix
, then this function should not return _generator_matrix
but <that encoder>.generator_matrix()
. That could be more elegantly handled by a call to super
's generator_matrix
function.LinearCodeGeneratorMatrixEncoder._repr_
and ._latex_
should include the word "the" just before the code, I think. Phew, I think that's it :-)
Best, Johan
Thanks for this (very) comprehensive work. I'm currently updating this to latest beta, and will start working on your comments as soon as compilation completes.
David
I made (almost all) the requested changes.
A few remarks:
On comment 4, note that I kept the sentence starting by "returns ..." to stick to docstring formatting rules.
On comment 6, I don't really get what you suggest. Do you propose to somehow register the unencoder_matrix and the information_set, for instance as some kind of class argument after their first computation?
On comment 9, I can't see what kind of other error can occur when encoding a word. Either it's an element from the message space and encoding goes well, either it's not and you get an error as illustrated in the doctest of encoder
. Or am I missing something here?
On comment 12, I actually deleted the whole block.
On comment 19, I changed the .. NOTE
block to a .. WARNING
block to make this remark even more visible on the doc of AbstractLinearCode
.
On comment 23, it's more helpful indeed. Change done.
On comment 24, as c
is not necessarily a codeword (if nocheck = True
, you can provide anything you want, as long as it belongs to the ambient space), I did not change the INPUT
line for c
.
On comment 26, it's already the case. Maybe you meant "shouldn't include the word "the" ", but in that case I think the error message would seem a bit weird imho. I have no strong feelings on this though.
Best,
David
Cool!
On comment 6, I don't really get what you suggest. Do you propose to somehow register the unencoder_matrix and the information_set, for instance as some kind of class argument after their first computation?
I'm suggesting that _unencoder_matrix
returns a tuple consisting of the matrix and the information set. Then unencode
can be changed to use that matrix and information set, which is promised to work together. Right now, if code.information_set
returns a different set next time it's called, then unencode
will break since it uses the matrix for the first information set.
On comment 9, I can't see what kind of other error can occur when encoding a word. Either it's an element from the message space and encoding goes well, either it's not and you get an error as illustrated in the doctest of
encoder
. Or am I missing something here?
Encoder.encode
is allowed to be a partial function over its message space (as explained in Encoder.message_space
). This is not clear in the doc string of Encoder.encode
. Presumably, a special type of error is supposed to be thrown if one tries to encode a value outside the allowed portion of the message space (say if I try to encode a polynomial of degree 2k in a [n,k] GRS code).
On comment 19, I changed the
.. NOTE
block to a.. WARNING
block to make this remark even more visible on the doc ofAbstractLinearCode
.
Good idea.
On comment 24, as
c
is not necessarily a codeword (ifnocheck = True
, you can provide anything you want, as long as it belongs to the ambient space), I did not change theINPUT
line forc
.
Well, yes, the function won't explode if you give it c
which is not a codeword and set nocheck = True
, but that's not at all the intention of that function, and it's also not the intention of nocheck = True
.
The function is undefined - and thus useless - on this input!
I think that the description of c
should reflect the intended use of the function. Not the widest possible type for which the function won't explode.
On comment 26, it's already the case. Maybe you meant "shouldn't include the word "the" ", but in that case I think the error message would seem a bit weird imho. I have no strong feelings on this though.
Yes, I meant "shouldn't" :-) It's debatable, I guess, but having "the" doesn't seem to be convention in Sage:
sage: MatrixSpace(RR, 2, 3)
Full MatrixSpace of 2 by 3 dense matrices over Real Field with 53 bits of precision
sage: PolynomialRing(GF(2),'x')
Univariate Polynomial Ring in x over Finite Field of size 2 (using NTL)
sage: Hom(RR, ZZ)
Set of Homomorphisms from Real Field with 53 bits of precision to Integer Ring
Furthermore, the _repr_
doesn't uniquely specify "the" code, but usually gives a summary. Like for LinearCode or the soon-to-come GRS codes.
Best, Johan
I'm suggesting that
_unencoder_matrix
returns a tuple consisting of the matrix and the information set. Thenunencode
can be changed to use that matrix and information set, which is promised to work together. Right now, ifcode.information_set
returns a different set next time it's called, thenunencode
will break since it uses the matrix for the first information set.
Got it! Change done.
Encoder.encode
is allowed to be a partial function over its message space (as explained inEncoder.message_space
). This is not clear in the doc string ofEncoder.encode
. Presumably, a special type of error is supposed to be thrown if one tries to encode a value outside the allowed portion of the message space (say if I try to encode a polynomial of degree 2k in a [n,k] GRS code).
Ok. I added a .. NOTE
block to explain this. And suggested to use EncodingError
if one tries to encode a word which is not in the message space.
Well, yes, the function won't explode if you give it
c
which is not a codeword and setnocheck = True
, but that's not at all the intention of that function, and it's also not the intention ofnocheck = True
. The function is undefined - and thus useless - on this input! I think that the description ofc
should reflect the intended use of the function. Not the widest possible type for which the function won't explode.
Agreed. I changed the doc.
For comment 26, as keeping the "the" is not convention in Sage and as I stick to the Sage's way of doing things, I deleted it.
Best,
David
Branch pushed to git repo; I updated commit sha1. New commits:
d0435eb | Replaced import of linear_code module by a lazy_import |
I presume that lazy_import was to avoid that sage.coding.encoder
got imported, but I'm pretty sure it still does. At least, it's present in sys.modules
and it shows up when doing sage.coding.<tab>
.
This might be due to the line in __init__.py
which will cause an immediate dereference of LinearCode
, which then causes loading encoder.py
too. But I didn't test that this is sufficient.
Branch pushed to git repo; I updated commit sha1. New commits:
5965150 | Moved line related to dictionary of encoders |
I moved the line which fills the dictionary of encoders for LinearCode
from __init__.py
to linear_code.py
. It will be useful to do that to avoid immediate dereference of LinearCode
in __init.py__
as explained in the message above.
Note that for now, encoder
module is still imported when Sage starts. I'll fix this (alongside with several other imports) in the follow-up ticket #19315
Mmh...
I just notice that patchbot stops complaining about startup_modules
...
Thing is, encoder
module is still imported when Sage starts, as explained in my previous message.
Anyone knows what this startup_modules
line means?
For now, linear codes don't have a proper structure for encoding, "encoding" meaning handle the bijection between a message space of the code and its ambient space.
We propose in this ticket a new structure, designed to handle message -> codeword and codeword -> message transformations. The former operation is designated as encoding, the latter as unencoding.
We provide the following:
An abstract class
Encoder
, for inheritances purposes. Every Encoder will inherit from this class. It contains:encode
andunencode
methods. They come with a default implementations for encoders whose message space is a vectorial space;generator_matrix
method (see Design) andAn exception class
EncodingFailure
, for errors related to encoding or unencoding.Methods for encoding and unencoding handling in
AbstractLinearCode
(see Design)Design
Some codes families might have different message spaces. In the case of -for instance- Reed-Solomon codes, it is possible to transform elements of a Polynomial Ring into codewords, as well as elements of a vectorial space. We wanted to be able to support encoding over all possible message spaces, which is here represented by Encoder subclasses. Continuing the Reed-Solomon example, we will have a
PolynomialEncoder
and aVectorialEncoder
.Considering this particularity, we chose to let encoders deal with generator matrix computation, as it depends on the message space.
Of course, this multiplicity of objects and spaces might be confusing for people who just want to encode elements without being bothered by all these messages spaces. So, we created a set of methods to directly perform encoding operations on the code itself. We added a new field in
AbstractLinearCode
, calleddefault_encoder
. This field indicates to the program which encoder it should use if none is specified. For instance, with a codeC
and a messagem
, one can directly callC.encode(m)
. In that case, the default encoder will be used to perform the encoding operation. This works for every encoder-related method, likeunencode()
, orgenerator_matrix
.As there might be numerous encoders for a same code family, we chose to maintain a dictionary of known encoders for every code family. In this ticket, only one encoder is provided, called
LinearCodeGeneratorMatrixEncoder
. If one wants to create an instance of this encoder for a codeC
, he can of course callbut we propose something better. While calling the method
C.encoders_available()
, one will get a list of all known encoders forC
. For any linear code, he will get['GeneratorMatrix']
. After that, he can callwhich will also instantiate
LinearCodeGeneratorMatrixEncoder
forC
and cache the result as well. So, any further call (for instance,C.encode(m, 'GeneratorMatrix')
) to this encoder will be fast, and will guarantee that the sameLinearCodeGeneratorMatrixEncoder
will be used every time.The dictionary of available encoders for every code family is filled in
__init__.py
, as any instance of a subclass has to know some specific encoders. Furthermore, it is possible to add an encoder subclass for a specific instance of a code using the methodadd_encoder
.With this design, we are able to satisfy specific experimentation which requires specific encoders as well as generic experimentation for which any encoder will be fine.
Users will probably most often use and expect the message space
F^k
, wherek
is the dimension of the code, andF
the field. For this reason, the default encoder should be an encoder with this message space, such thatC.encode(m)
expectsm
fromF^k
andC.unencode(c)
returns an element ofF^k
.We also now have a default implementation of
generator_matrix
inAbstractLinearCode
which calls thegenerator_matrix
method of the selected encoder.CC: @johanrosenkilde @jpflori @videlec @defeo @sagetrac-danielaugot @ClementPernet
Component: coding theory
Author: David Lucas
Branch/Commit:
09cff05
Reviewer: Johan Sebastian Rosenkilde Nielsen
Issue created by migration from https://trac.sagemath.org/ticket/18376