New encoding structure for linear codes

sagemath / sage

Main repository of SageMath

https://www.sagemath.org

Other

1.45k stars 482 forks source link

New encoding structure for linear codes #18376

Closed 1861b8a9-77f0-4f35-8431-8514a75b40d1 closed 9 years ago

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

For now, linear codes don't have a proper structure for encoding, "encoding" meaning handle the bijection between a message space of the code and its ambient space.

We propose in this ticket a new structure, designed to handle message -> codeword and codeword -> message transformations. The former operation is designated as encoding, the latter as unencoding.

We provide the following:

An abstract class Encoder, for inheritances purposes. Every Encoder will inherit from this class. It contains:
- a default constructor;
- both encode and unencode methods. They come with a default implementations for encoders whose message space is a vectorial space;
- several methods to get information on the instance of the class;
- generator_matrix method (see Design) and
- explanations on what to do to create a new encoder subclass.
An exception class EncodingFailure, for errors related to encoding or unencoding.
Methods for encoding and unencoding handling in AbstractLinearCode (see Design)

Design

Some codes families might have different message spaces. In the case of -for instance- Reed-Solomon codes, it is possible to transform elements of a Polynomial Ring into codewords, as well as elements of a vectorial space. We wanted to be able to support encoding over all possible message spaces, which is here represented by Encoder subclasses. Continuing the Reed-Solomon example, we will have a PolynomialEncoder and a VectorialEncoder.

Considering this particularity, we chose to let encoders deal with generator matrix computation, as it depends on the message space.

Of course, this multiplicity of objects and spaces might be confusing for people who just want to encode elements without being bothered by all these messages spaces. So, we created a set of methods to directly perform encoding operations on the code itself. We added a new field in AbstractLinearCode, called default_encoder. This field indicates to the program which encoder it should use if none is specified. For instance, with a code C and a message m, one can directly call C.encode(m). In that case, the default encoder will be used to perform the encoding operation. This works for every encoder-related method, like unencode(), or generator_matrix.

As there might be numerous encoders for a same code family, we chose to maintain a dictionary of known encoders for every code family. In this ticket, only one encoder is provided, called LinearCodeGeneratorMatrixEncoder. If one wants to create an instance of this encoder for a code C, he can of course call

E = LinearCodeGeneratorMatrixEncoder(C)

but we propose something better. While calling the method C.encoders_available(), one will get a list of all known encoders for C. For any linear code, he will get ['GeneratorMatrix']. After that, he can call

E = C.encoder('GeneratorMatrix')

which will also instantiate LinearCodeGeneratorMatrixEncoder for C and cache the result as well. So, any further call (for instance, C.encode(m, 'GeneratorMatrix')) to this encoder will be fast, and will guarantee that the same LinearCodeGeneratorMatrixEncoder will be used every time.

The dictionary of available encoders for every code family is filled in __init__.py, as any instance of a subclass has to know some specific encoders. Furthermore, it is possible to add an encoder subclass for a specific instance of a code using the method add_encoder.

With this design, we are able to satisfy specific experimentation which requires specific encoders as well as generic experimentation for which any encoder will be fine.

Users will probably most often use and expect the message space F^k, where k is the dimension of the code, and F the field. For this reason, the default encoder should be an encoder with this message space, such that C.encode(m) expects m from F^k and C.unencode(c) returns an element of F^k.

We also now have a default implementation of generator_matrix in AbstractLinearCode which calls the generator_matrix method of the selected encoder.

CC: @johanrosenkilde @jpflori @videlec @defeo @sagetrac-danielaugot @ClementPernet

Component: coding theory

Author: David Lucas

Branch/Commit: 09cff05

Reviewer: Johan Sebastian Rosenkilde Nielsen

Issue created by migration from https://trac.sagemath.org/ticket/18376

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

Branch: u/dlucas/encoder

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

Commit: bad9715

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

Last 10 new commits:

`7048dbb`	`New encoder-related methods in linear_code.py`
`15cd2cd`	`New encoder for linear codes`
`b54bac5`	`Links with encoder structure`
`5f4ca77`	`Default implementation of generator_matrix in AbstractLinearCode. Changed documentation`
`c6418c9`	`Changes to the documentation`
`1a0a5a4`	`Add new method for adding encoders to the list of available encoders on the fly. Changed encoders_available method`
`c82673a`	`Changes to the documentation`
`e9d31cb`	`Fixed conflict after merge with 6.7.4`
`9b7ac0b`	`Default equality check method in Encoder. Changed naming convention for encoders. Removed linear_code_encoders file. Updated documentation`
`bad9715`	`Hid encoders under codes.encoders.`

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

Author: David Lucas

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-For now, linear codes don't have a proper structure for encoding, "encoding" meaning handle the bijection between the message space of the code and its ambient space.
+For now, linear codes don't have a proper structure for encoding, "encoding" meaning handle the bijection between a message space of the code and its ambient space.

 We propose in this ticket a new structure, designed to handle message -> codeword and codeword -> message transformations. The former operation is designated as encoding, the latter as unencoding.

@@ -46,5 +46,10 @@

 With this design, we are able to satisfy specific experimentation which requires specific encoders as well as generic experimentation for which any encoder will be fine.

-Note that, as it sounds more intuitive to manipulate vectors, `C.unencode()` will always return an element of a vectorial space.
+Users will probably most often use and expect the message space `F^k`,
+where `k` is the dimension of the code, and `F` the field. For this reason,
+the default encoder should be an encoder with this message space, such
+that `C.encode(m)` expects `m` from `F^k` and `C.unencode(c)` returns an
+element of `F^k`.
+
 We also now have a default implementation of `generator_matrix` in `AbstractLinearCode` which calls the `generator_matrix` method of the selected encoder.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from bad9715 to 3715880

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

3715880 Fix related to encoders_catalog file

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

f447b9a Updated documentation in encoder

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 3715880 to f447b9a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from f447b9a to 188b56f

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`e8f9fdb`	`Merge with 6.8beta3`
`188b56f`	`Minor changes`

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:9

I noticed a few things that could be enhanced while rereading my code, and made the changes. This ticket is still pending for review, and not a work in progress :)

kwankyu commented 9 years ago

comment:10

The term "unencoding" is absolutely logical. But assuming it is not yet settled in the literature, I propose to consider "lift"as an alternative to mean the same thing. It is short and friendly (and mathematish :-)

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:11

Hello,

I think the term "lift" is not that good to describe such an operation. The operation we're talking about here consists in taking an element of the "bigger space" (the ambient space) to the "smaller space" (the message space), while "lift" usually means going from the small space to the big one.

Furthermore, I think it's not really intuitive when it comes to discovery by completion: if one writes sage: C.<tab> or sage: E.<tab> (with C a code and E an encoder) and sees the word unencode, he understands immediately (imho) the meaning of it, while the term "lift" does not seem to be that straightforward.

0606207b-931f-4eb8-aaff-b4a0f8d02780 commented 9 years ago

comment:12

I also find it a problem not to have a proper name in the literature for reverting to the message from a codeword. Unencoding is definitely not nice, but it is very clear, where lift has a very vague meaning, which varies a lot depending on the setting.

And suppose furthermore one day we want to deal with codes over Z4, than it would be very proper to call "lift" the fonction which takes a F2 codeword c2 and propose a Z4 codeword c4 which reduces to c2 mod 2.

kwankyu commented 9 years ago

comment:14

Go for "unencode". There seems to be no better alternative.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:15

Helloooooo,

Here are my remarks/questions while I was looking for the first time at your code. Let's get this thing in so that you are not blocked for the more mathematical parts.

Vectorial space -> vector space?
default implementations of encode() and unencode_nocheck() methods
call Encoder‘s __init__() in --> Encoder.__init__ ?
As Encoder is not designed to be implemented -> instanciated?
Instead of typing the name of a method/class/function inside backticks, you should link to it, i.e. replace: Returns the code in which ``self.encode()`` has its output. with Returns the code in which :meth:`encode` has its output. You can then check by compiling the documentation with --warn-links that everything behaves as expected. (happens in many places)
Transforms an element of the message space into an element of the code --> into a codeword?
You have a unencode() method with a check argument on one hand, and a unencode_nocheck method on the other hand. It makes me think that the two should be merged somehow, but I do not know exactly how they differ.
What about rewriting the following message by creating a bibliography entry for codinglib? {{{This function is taken from codinglib (https://bitbucket.org/jsrn/codinglib/) and was written by Johan Nielsen.}}} It would become This function is taken from codinglib [Nielsen]_ and link toward an entry where both the url and the authors are given.
Documentation of unencoder_matrix: This is a helper function, for internal use only. If it is not meant for public use, can't we rename it to _unencoder_matrix?
EncodingFailure -> Given that most exceptions end with Error, what about EncodingError?
Index of encoders -> It is a bit odd that you advertise codes.encoders as the way to get their list, while the entry that follows starts with linear_code.
To import these names into the global namespace, use: --> it sems that the code is already in the global namespace?...
encoder_default_name -> default_encoder?
if(name in reg_enc.keys()) --> creates a temporary and useless list, i.e. the list of keys. if name in reg_enc works.
(honest question) what is the point of letting the users add encoders with this add_encoder method? If a user implements a new encoder, what does he earn by linking it inside of the instance instead of using it independenly? If there is such an advantage, could you illustrate it with an example in the function's docstring?
When you accept **kwargs as input, please say that all additional arguments are accepted and forwarded to function : def encode(self, word, name=None, **kwargs):
In the same encode function, I find the variable name a bit vague. What about encoder_name or encoder?
The same happens in: encoder(), generator_matrix, unencode.
Instead of encoders_available, what about simply encoders? I do not see the added value of _available there. Furthermore, what about returning a copy of the inner dictionary? Given that you can use a dictionary as you would use a list (list(my_dict), for x in my_dict, if x in my_dict) why wouldn't we return everything at once?
information_set takes nothing as argument but calls generator_matrix which can depend on an encoder. Could you expose that flag and document it? That's the price of exposing in LinearCode the methods which are defined only in its encoder object.. It is a bit awkward :-/
Really, these methods encode and unencode have nothing to do in LinearCode. You should "accept your own design" and have all these things be available only in the Encoder object, otherwise there will be a crazy amount of copy/paste with those arguments.

What about simplifying

The only purpose of this encoder is to set generic linear codes
into the new Encoder structure by providing a valid ``generator_matrix``
method.

As This is the default encoder of a generic linear code?

About:
```
Its ``generator_matrix`` method returns private field ``_generator_matrix``
of its associated code if any, else it calls the ``generator_matrix`` method
of the default encoder of the associated code.
```
Isn't that "over-documenting"? One can obain this kind of information by reading the code (e.g. by adding ?? at the end of the function-->easy), and it is very unlikely that we will think of updating it if it ever changes.

How is this

if hasattr(self.code(), "_generator_matrix"):
  return self.code()._generator_matrix
else:
  return self.code().generator_matrix()

better than return self.code().generator_matrix()?

About:
```
class Encoder(SageObject):
 ...
 This class provides:

 - some methods for encoder objects
```
Err. Well, I could have guessed :-P Why wouldn't you say "nothing there"? This kind of information can be obtained by tab completion or by browsing the html doc.
If the following is meant for developers, I think we do not need it (especially since the patches will either be written or reviewed by you :-P). As for users, I can't see what they are expected to make of it. Surely they can call their classes as they like?
```
.. NOTE::

  For consistency on encoders, please follow this convention on names for subclasses:
  for a new encoder named ``EncName``, for code family ``CodeFam``, call it
  ``CodeFamEncNameEncoder``.
```
If this code must be an instance of a specific class, could you make it explicit? It may be as simple as turning the second occurrence of 'code' into a Sphinx link toward a class. - ``code`` -- the associated code of ``self``
I do not understand why the Encoder class provides methods to encode/decode linear codes from the matrix, given that you cannot be sure that a matrix is available. Shouldn't this be in LinearCodeGeneratorMatrixEncoder?
About having this 'name' parameter everywhere. What would be the problem with some set_encoder function which would define the instance's default encoder, and have all other functions call that? This could be defined from a string (using the list of available encoders) or by giving a class directly, which would make a public 'add_encoder' function useless, as users would have an easy way to plug their own encoder into the instance.

I am now in Nantes. It took me a Nice->Nantes flight to work on this review. I love this city. Plus the weather is good but not as hot as in the south. Cooooooool.

Have fuuuuuuuuuuuun,

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 188b56f to aa42238

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`d94c53a`	`Update to 6.8`
`aa42238`	`Integrated reviewer's comments`

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:17

Hello!

Here are my remarks/questions while I was looking for the first time at your code. Let's get this thing in so that you are not blocked for the more mathematical parts.

Thanks a lot for this, and for being that thorough! It's exactly the kind of comments we were hoping for.

I fixed the typos and small mistakes you noticed. Here are some answers to your questions:

To import these names into the global namespace, use: --> it sems that the code is already in the >global namespace?...

It shouldn't be available in global namespace. I fixed that.

encoder_default_name -> default_encoder?

I kept encoder_default_name as it's the name of the encoder (the string that points to it), and not the object in itself.

How is this

if hasattr(self.code(), "_generator_matrix"):
    return self.code()._generator_matrix
else:
    return self.code().generator_matrix()

better than return self.code().generator_matrix()?

This encoder is designed to put generic linear codes (LinearCode class) into the new framework. And in LinearCode, you have a class parameter _generator_matrix because one has to provide a generator matrix when he build a "structureless" linear code. As generator_matrix() method in AbstractLinearCode returns the generator matrix given by the default encoder, which is, for these LinearCodes the one you pointed out, if this encoder had this self.code().generator_matrix() in its generator_matrix() method, it will lead to an infinite loop when calling LinearCode.generator_matrix().

information_set takes nothing as argument but calls generator_matrix which can depend on an encoder. Could you expose that flag and document it? That's the price of exposing in LinearCode the methods which are defined only in its encoder object.. It is a bit awkward :-/

Actually, the information set can be computed from any generator matrix, and it will always return the same result whatever the generator matrix I used to compute it. As the default encoder of a code has a valid generator_matrix method, and as C.generator_matrix() calls the default encoder, the code of information_set() will always return a valid result.

Really, these methods encode and unencode have nothing to do in LinearCode. You should "accept your own design" and have all these things be available only in the Encoder object, otherwise there will be a crazy amount of copy/paste with those arguments.

I do think they should be there. A lot of users won't care about the kind of encoding/decoding which will be used underneath as long as they get what they expect. For these users, being able to directly call MyCode.encode(m) or MyCode.unencode(c) instead of having to ask for the list of available encoders, pick the one they want, create it, and call encode or unencode on it is a really helpful feature.

I do not understand why the Encoder class provides methods to encode/decode linear codes from the matrix, given that you cannot be sure that a matrix is available. Shouldn't this be in LinearCodeGeneratorMatrixEncoder?

It actually provides a default implementation that will avoid spending time (and repeating code) in case of you have a generator matrix for your code. If you know a generator matrix for the encoder you're writing, you just have to implement generator_matrix and you get immediately encode and unencode.

If you don't have one, you have to override these methods in your encoder. It's just a way to make developer's life easier :)

Instead of encoders_available, what about simply encoders? I do not see the added value of _available there. Furthermore, what about returning a copy of the inner dictionary? Given that you can use a dictionary as you would use a list (list(my_dict), for x in my_dict, if x in my_dict) why wouldn't we return everything at once?

In AbstractLinearCode, we already provide a method called encode and another called encoder. I think adding a third one called encoders might lead to some confusion amongst the users, especially as encoder returns an Encoder object, one might think encoders will return a list of all possible Encoders objects available for the provided code. Plus, I think the _available actually adds value, by meaning "this is the list of all encoders you can access with your linear code". About the dictionary, if I just want to build a specific encoder by using C.encoder(name='mySpecificName), all I care about is actually the list of names of all the encoders my C can access to. I think returning the whole dictionary might be a bit confusing, as one will recover the objects as well as the string names representing them, but he only needs the string names to build them (thanks to the encoder method).

Now, about your idea of having a set_encoder method. There (imho) several problems with that, namely:

the existence of _default_encoder_name as a class argument allow all the methods of LinearCode that rely on a generator matrix to work. Indeed, as they call generator_matrix() without any argument in their bodies, and as we always set an encoder which knows a generator matrix as a default encoder, all these methods will always work. Now, if you allow the user to dynamically change this, and if he changes the encoder to another which does not has a generator matrix, all these methods will fail... Which is a bit silly, because what we only care about when the message space is not a vector space are the differences when it comes to encoding/unencoding, we don't want that to change the behaviour of other methods.
also, it will lead to the deletion of these "name" arguments, as you said. So if I want to test different encoding styles (or later on, decoding algorithms) I will have to reset a new encoder everytime. Of course it's only an extra line of code everytime, and of course encoders are cached so it's not heavy (memory-wise), but it still adds a bit of heaviness in the process.

So I'd prefer to keep it as is. And in that case, I also vote to keep add_encoder method.

I am now in Nantes. It took me a Nice->Nantes flight to work on this review. I love this city. Plus the weather is good but not as hot as in the south. Cooooooool.

Sounds nice, enjoy your vacations!

David

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:18

Hellooooooo again,

It seems that your branch has become uncompatible with the latest beta. Turns out that it is my fault: #18926 ^^;

encoder_default_name -> default_encoder?

I kept encoder_default_name as it's the name of the encoder (the string that points to it), and not the object in itself.

I see. Would it be possible to pick 'default_encoder_name' then? It sounds "more english" to say "the default encoder's name". Otherwise it seems that you are talking of the default name of an encoder.

This encoder is designed to put generic linear codes (LinearCode class) into the new framework. And in LinearCode, you have a class parameter _generator_matrix because one has to provide a generator matrix when he build a "structureless" linear code. As generator_matrix() method in AbstractLinearCode returns the generator matrix given by the default encoder, which is, for these LinearCodes the one you pointed out, if this encoder had this self.code().generator_matrix() in its generator_matrix() method, it will lead to an infinite loop when calling LinearCode.generator_matrix().

I do not think that I follow this hierarchy of classes and calls between them as well as you do. My point here is that the code tries to meddle with the private attributes of another class, and I thought that it could be easily avoided. I can understand that there would be an infinite loop somewhere if you only did the replacement I proposed, so what about moving this cache check to the .generator_matrix method of the code() object? In this situation, you would be able to perform the replacement I proposed, and the only method which would try to detect private attributes would do so in its own class which is a bit 'cleaner'.

Actually, the information set can be computed from any generator matrix, and it will always return the same result whatever the generator matrix I used to compute it. As the default encoder of a code has a valid generator_matrix method, and as C.generator_matrix() calls the default encoder, the code of information_set() will always return a valid result.

Agreed for validity, now what about speed? I agree to let it lie, but having these things appear is a sign that those methods are exposed too high, especially when you want to handle several encoders at once.

Really, these methods encode and unencode have nothing to do in LinearCode. You should "accept your own design" and have all these things be available only in the Encoder object, otherwise there will be a crazy amount of copy/paste with those arguments.

I do think they should be there. A lot of users won't care about the kind of encoding/decoding which will be used underneath as long as they get what they expect.

That's hardly a problem: they can do encoder = my_code.encoder() and then play with the encoder. They can even do my_code.encoder().encode() as this thing is cached.

For these users, being able to directly call MyCode.encode(m) or MyCode.unencode(c) instead of having to ask for the list of available encoders, pick the one they want, create it, and call encode or unencode on it is a really helpful feature.

That's not true, as the example above shows. They do not have to go through the list, or understand it, or anything else. Your design here is leading you to copy the functions of an encoder to a higher-level class, and it does not sound at all like a good idea from the programming point of view. From the user's point of view this is also debattable, for having users call 'encoder()' makes them aware that several encoders are available.

You designed your code to have Encoder objects, and quite naturally that's where the encode/unencode methods live. If you distinguish a code and its encoder, you should accept the consequences. And they are not so bad on the user's side, for (s)he can create the encoder without making any choice, i.e. by calling .encoder().

I do not understand why the Encoder class provides methods to encode/decode linear codes from the matrix, given that you cannot be sure that a matrix is available. Shouldn't this be in LinearCodeGeneratorMatrixEncoder?

It actually provides a default implementation that will avoid spending time (and repeating code) in case of you have a generator matrix for your code. If you know a generator matrix for the encoder you're writing, you just have to implement generator_matrix and you get immediately encode and unencode.

If you don't have one, you have to override these methods in your encoder. It's just a way to make developer's life easier :)

I did not mean that the code should be replaced, but that it should be moved somewhere else. You define methods which assume more on the object than what it should (i.e. that the matrix is available). If it is instanciated without that matrix these functions will break.

What i suggest is to move them to a class (inheriting from Encoder) whose name makes it clear that the generator matrix should be available. In this situation, anybody creating a class that inherits from that other class will do it in full knowledge that a matrix has to be available, and does not run the risk of inheriting broken methods.

You can pick any name you like for that other class, of course.

In AbstractLinearCode, we already provide a method called encode and another called encoder. I think adding a third one called encoders might lead to some confusion amongst the users, especially as encoder returns an Encoder object, one might think encoders will return a list of all possible Encoders objects available for the provided code. Plus, I think the _available actually adds value, by meaning "this is the list of all encoders you can access with your linear code".

I disagree with the implementation choice and the explanations, but I am ready to classify it under "matters of taste", in which case I don't have any reason to force mine, given that you are the one who writes the code.

About the dictionary, if I just want to build a specific encoder by using C.encoder(name='mySpecificName), all I care about is actually the list of names of all the encoders my C can access to. I think returning the whole dictionary might be a bit confusing,

Okay, accepted under 'confusing' when the user asks for the names only, and cannot be assumed to understand more technicality.

When the users explicitly asks for both names and classes, however, I see little interest in returning the list of .items() instead of the dictionary. Would you have anything against returning a copy of the dict in this case?

Now, about your idea of having a set_encoder method. There (imho) several problems with that, namely:

the existence of _default_encoder_name as a class argument allow all the methods of LinearCode that rely on a generator matrix to work. Indeed, as they call generator_matrix() without any argument in their bodies, and as we always set an encoder which knows a generator matrix as a default encoder, all these methods will always work. Now, if you allow the user to dynamically change this, and if he changes the encoder to another which does not has a generator matrix, all these methods will fail... Which is a bit silly, because what we only care about when the message space is not a vector space are the differences when it comes to encoding/unencoding, we don't want that to change the behaviour of other methods.

You convinced me that some encoders have features that others do not have, and that for this reason it made no sense to have a 'main' encoder in the class.

also, it will lead to the deletion of these "name" arguments, as you said. So if I want to test different encoding styles (or later on, decoding algorithms) I will have to reset a new encoder everytime.

Not true. You would create several encoders, and use them directly.

Have fun,

Nathann

johanrosenkilde commented 9 years ago

comment:19

I'm going to jump in on the discussion about whether to have my_code.encode(), my_code.unencode() and my_code.generator_matrix() or not. I've reflected on what you wrote Nathann, and though I'm not impossibly against removing these functions, I still argue that they are better retained.

The difference is really a matter of style and taste, and the opinion on what we're optimising our interface for. These functions are "end-user" functions, which can be compared to a top-level API for e.g. a library design. Also there is it rather common to "forward" functions to make common use cases slightly more obvious and convenient to the library user (for instance, writing "randint(5)" instead of "Random().randInt(5)").

I agree that the difference between writing my_code.encode(m) and my_code.encoder().encode(m) is not huge, but the latter exposes more clearly the computer science way of implementing coding theory, where the former more directly supports a "naive" view of things.

Coding theory is often taught in a way that codes and generator matrices are very connected. Even many coding theory researchers would not make it a primary concern to support multiple encoders and generator matrices for a given code (e.g.: none of previours-Sage, GAP or Magma has this). Forcing these users to write something like my_encode.encoder().encode(m) shoves the view of multiple encoders down their throat, whatever they want it or not. I could imagine that it would add a "eew"-factor to their experience. For students in a coding theory class, it would add a little extra distance between their classroom experience and the software experience.

These are not kill-all arguments. So if the added overhead of having and maintaining these shortcuts was large, I would vote against having them. However, we are talking about three very small functions (and two more for decoders in a later ticket), and no maintenance effort that I can see.

Johan

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:20

Hello,

I do not think that there is something inherently wrong in the way you expect user to call your code. To me, there is something wrong in the design of classes, and the way it is implemented. I tried to show that 'accepting the class design' would lead you to a different user interface.

For a while, I tried to forget what you wanted to represent (easier for me than for you), and it appeared to me that you were implementing class inheritance backwards.

Look at it, it is quite surprising: here is the list of methods of an encoder object I built:

sage: e.
e.category          e.db                e.dumps             e.generator_matrix  e.parent            e.reset_name        e.unencode          e.version           
e.code              e.dump              e.encode            e.message_space     e.rename            e.save              e.unencode_nocheck

If you remove the useless ones, you are left with those:

e.code()             # returns the 'parent' code()
e.encode()           # =e.code().encode()
e.generator_matrix() # =e.code().generator_matrix()
e.message_space()    # =e.code().ambiant_space()
e.unencode()         # =e.code().unencode()

It is funny to notice two things: 1) All encoder methods can be called directly from the code (i.e. at the moment there is no point to build the encoder object) 2) Some code functions call the encoder's function, which leads you to create one-line functions with 30 lines of doc each time.

If I just look at the code, it seems that you just want your code to inherit the methods implemented in the encoder, i.e. generator_matrix, encode, and unencode. Then you get this for free.

If you find a way to make all these functions available at the level of the code without having to copy/paste a lot of things, it would solve my main (and perhaps only) objection about the classes' design.

Nathann

johanrosenkilde commented 9 years ago

comment:21

Hi Nathann,

I don't think I'm following your criticism. I'll try to comment on your points.

For a while, I tried to forget what you wanted to represent (easier for me than for you), and it appeared to me that you were implementing class inheritance backwards.

Yes, we are "forwarding" certain functions from (multiple) encoder classes in one class AbstractLinearCode. The many classes that will inherit from AbstractLinearCode will all inherit this forwarding. There is a one-shot overhead in the implementation of AbstractLinearCode (in this ticket) -- that's all.

Look at it, it is quite surprising: here is the list of methods of an encoder object I built: ... It is funny to notice two things: 1) All encoder methods can be called directly from the code (i.e. at the moment there is no point to build the encoder object)

Yes, that's a feature: the encoder objects will be created behind the scenes and the user will access them transparently through the code object.

The "there is no point to build the encoder object" is unclear to me. In the proposed ticket, the encoder object must be build behind the scenes, or the code functions would not work. Note that there is an (at least) one-to-many relation between code classes and encoders. This ticket is about supporting that multiplicity of encoders in an elegant fashion; elegance seen from the user perspective and future code class designers.

The proposed ticket puts as much of the boiler-plate as possible into AbstractLinearCode, e.g. for selecting the right encoder, caching the encoder (including whatever cached objects it has), etc. Furthermore it puts in implementations of encode, unencode and generator_matrix whose usability purposes we have already discussed. Note again that these implementations will never be touched by sub-classes but will "just work".

2) Some code functions call the encoder's function, which leads you to create one-line functions with 30 lines of doc each time.

"Each time" means this ticket. No more. All later codes will inherit this forwarding and will never touch encode, unencode and generator_matrix.

If I just look at the code, it seems that you just want your code to inherit the methods implemented in the encoder, i.e. generator_matrix, encode, and unencode. Then you get this for free.

Are you suggesting killing off Encoder entirely? I think that impoverishes the design a lot. That encoders are instances of classes means that they have lots of power, e.g. caching and intro-spection. If that's not what you're talking about, then I don't understand.

If you find a way to make all these functions available at the level of the code without having to copy/paste a lot of things, it would solve my main (and perhaps only) objection about the classes' design.

Good, because that's how the current ticket is from the point of view of new code classes and encoders :-) They will just inherit from AbstractLinearCode or Encoder respectively, and these "copy/paste" functions will be inherited.

Best, Johan

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:22

Hello guys,

I do not think that I can bring anything new to this thread, and I do not see us going anywhere. I tried to answer several times, I did not have anything smart nor enlightening to add.

I cannot say that I agree with that code either, and there is no point in blocking the ticket until we drive ourselves mad over it.

If somebody else comes here and wants to give it a positive review I will not object. Same if you review it between yourselves. But I will not. What I could contribute was my view on the matter, and I did that already.

Have fun guys,

Nathann

johanrosenkilde commented 9 years ago

comment:23

Hi Nathann,

I do not think that I can bring anything new to this thread, and I do not see us going anywhere. I tried to answer several times, I did not have anything smart nor enlightening to add.

I completely disagree: you pointed out several things, and the main discussion here has been interesting. David, me and our colleague Julien spent about an hour rediscussing the code with your point of view as alternative. We just concluded that we preferred the proposed model :-)

I cannot say that I agree with that code either, and there is no point in blocking the ticket until we drive ourselves mad over it.

If somebody else comes here and wants to give it a positive review I will not object. Same if you review it between yourselves. But I will not. What I could contribute was my view on the matter, and I did that already.

Very fair, thanks. In any case, if/how the ticket gets reviewed, after whatever possible changes, the code will never be set in stone and we can change it if it sucks :-)

Thanks a lot for your input, Nathann! And have fuuuuuuuuuun in Nantes.

Johan

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from aa42238 to b147148

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`57380e5`	`Update to 6.9beta0`
`b147148`	`Integrated reviewer's comments`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from b147148 to d132d3c

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d132d3c Update to 6.9beta4

johanrosenkilde commented 9 years ago

comment:27

Hi,

1st step of review: looking at what the patchbot complains about:

There is a function in sage.coding.encoder without a test. Is it the abstract generator_matrix that's causing it?
The mystic startup_modules fails. This turns out to mean that the patchbot has detected that a new module is loaded at Sage startup: a thing that is nowadays attempted to keep a lid on by replacing normal imports with lazy_import. The new module is sage.coding.encoder. I'm pretty sure that this is due to line 3660 of linear_code which imports encoder. It doesn't seem possible to remove this import or change it to lazy_import since Encoder is referred just below. But what if linear_code itself was lazily imported in sage.coding.all? Some care possibly needs to be taken with the registered_encoders line in __init__. Could this line be put somewhere to activate it only when the lazy_import of linear_code gets dereferenced?

Best, Johan

johanrosenkilde commented 9 years ago

comment:28

There is a function in sage.coding.encoder without a test. Is it the abstract generator_matrix that's causing it?

Yes it is:

    $ ./sage  -coverage src/sage/coding/encoder.py 
    ------------------------------------------------------------------------
    SCORE src/sage/coding/encoder.py: 87.5% (7 of 8)

    Missing doctests:
        * line 268: def generator_matrix(self)
    ------------------------------------------------------------------------

How does one test an abstract method?

johanrosenkilde commented 9 years ago

comment:29

How does one test an abstract method?

Looking at other uses of abstract_method, it seems that a specific example is used in the doctest. So you could use an example with LinearCodeGeneratorMatrixEncoder.

Best, Johan

johanrosenkilde commented 9 years ago

comment:30

Ok, some comments, small and slightly less small:

Docstring of class Encoder says "... needed to describe properly the code defined in the subclass". "code" should be "encoder". But probably that sentence confuses more than it helps in any case :-)
Same docstring, why is "Then, if the message space..." and the following paragraph not part of the list?
The paragraph "Equality methods ..." is a bit strange. Perhaps

By default, comparison of Encoders (the methods__eq__ and __ne__) are by memory reference: If you build the same encoder twice, they will be different. If you need something more clever, override __eq__ and __ne__ in you subclass.

This paragraph should also be in the list. The last paragraph on representation should also be in the list.
Same docstring. If message space is not F^k, you also need to override message_space.
Docstring of Encoder.encode, INPUT block "the code" should beself. OUTPUTblock.selfshould beself.code`.
Encoder.encode. The ValueError should not refer to "vector", I think, to set the precedent of this error message for other encoders. Say The value to encode instead, for instance. Also remove the a just before M in that error.
Docstring of Encoder.unencode could perhaps be more precise. Perhaps instead "The inverse of encode: return the message corresponding to the codeword c." Also the INPUT block seem to have been wrongly copy/pasted from a time when this was in LinearCode. Also, last sentence in describing nocheck could perhaps more enlightening be "You might set this to True to disable the check for saving computation. Note that if c is not in self and nocheck = True, then the output of unencode is not defined (except that it will be in the message space of self).
Encoder.unencode. You don't need all the else stuff: remove the first else branch completely, and always run the second return self.unencode_nocheck(c).
Encoder.unencode_nocheck. Perhaps the information set corresponding to _unencoder_matrix should be saved together with that matrix. Currently, the code -- rather brittly -- depends on self.code.information_set always returning the same result. Which is not part of that method's docstring (even though you now added @cached_method on its default implementation, which should still be there).
Encoder.unencode_nocheck, docstring. I like that you added an example for a vector not in C. But perhaps you could reformulate that sentence to say something like "Taking a vector that does not belong to C will not raise an error, but probably just give a non-sensical result".
Encoder.code docstring. Perhaps just "Returns the code for this Encoder"? Also, in the Example, last line could be E.code() == C instead.
Encoder.encode docstring: after reading the docstring of Encoder.message_space, I think that perhaps a reference to the possible partial function nature of Encoder.encode should be given there towards the end, as an information to future Encoder developers? Perhaps say what error should be raised in that case.
Perhaps the encoders import statement in coding/codes_catalog should also be lazy? I know that codes_catalog is already lazy, but encoders will usually not be used, even if you use codes_catalog.
Index of encoders, docstring could be generated with the new auto-table functions Nathann made.
class AbstractLinearCode docstring: I don't understand the point of This class provides: .... Why is length there? There is a private property _length and a method length(). But for instance, dimension() is not mentioned? Why do you list default_encoder_name and _registered_encoders there? Perhaps
AbstractLinearCode: the error thrown when giving an unknown encoder could perhaps hint at how to add this default encoder.
AbstractLinearCode docline "fill the dictionary..." should add .py to the name of __init__.
AbstractLinearCode docstring on __cmp__ and __eq__ could perhaps be clearer. Something like "AbstractLinearCode has generic implementations of the comparison methods __cmp__ and __eq__ which use the generator matrix and are quite slow. In subclasses you are encouraged to override these functions." Is __cmp__ and __eq__ the exhaustive list of comparison functions?
AbstractLinearCode.add_encoder. Perhaps the doc should clarify that the added encoder is only added to self and not any member of the class. As well as point the reader to how to do that instead.
AbstractLinearCode.add_encoder. I don't like that all newly created codes copy the dictionary of registered encoders, when almost always, add_encoder is not going to be called on a created code. Instead, I think that in add_encoder you can add a check to see if self._registered_encoders is self.__class__._registered_encoders or something. If this is the case, then you make a new copy of self._registered_encoders and add the new encoder to it. Otherwise, add_encoder must have been already called on this code and you shouldn't make a new copy of course.
AbstractLinearCode.encode doc is missing punctuation many places. Also in encoder. Maybe also elsewhere.
AbstractLinearCode.encode docstring says "the message space". Should be "a message space". It should say prominently in the doc of both encode and encoder that the default encoder always has F^k as message space. Come to think of it, that should also be stated very clearly in the doc of class AbstractLinearCode.
AbstractLinearCode.encoder docstring refers to name but its encoder_name. The doc should cleary describe that the encoder is cached (right now it's only indirectly referred to in the last pargraph). I think there should be a test of this.
AbstractLinearCode.encoder the line return self.encoder(encoder_name, **kwargs) can simply be removed.
AbstractLinearCode.encoders_available the variable reg_enc is superfluous and not even used, two lines below. Perhaps the argument values could be renamed to classes.
AbstractLinearCode.encode and .generator_matrix and .unencode doc of kwargs and LinearCode.generator_matrix. Would it be more helpful to write "all additional arguments are forwarded to the construction of the encoder that is used."
AbstractLinearCode.unencode doc could be made more precise, like the Encoder.unencode doc. The doc of input c should just say that c is a codeword of C. nocheck should also be clarified like in Encoder. The OUTPUT block is uninformative and plain wrong (if encoder_name is set to something).
LinearCode.generator_matrix. Why does this check that _generator_matrix exists? That is always the case? More importantly, if encoder_name is not None or GeneratorMatrix, then this function should not return _generator_matrix but <that encoder>.generator_matrix(). That could be more elegantly handled by a call to super's generator_matrix function.
LinearCodeGeneratorMatrixEncoder._repr_ and ._latex_ should include the word "the" just before the code, I think.

Phew, I think that's it :-)

Best, Johan

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:31

Thanks for this (very) comprehensive work. I'm currently updating this to latest beta, and will start working on your comments as soon as compilation completes.

David

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`130587d`	`Update to latest beta`
`2115bf4`	`Changes accordingly to reviewer's comments`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from d132d3c to 2115bf4

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:33

I made (almost all) the requested changes.

A few remarks:

On comment 4, note that I kept the sentence starting by "returns ..." to stick to docstring formatting rules.

On comment 6, I don't really get what you suggest. Do you propose to somehow register the unencoder_matrix and the information_set, for instance as some kind of class argument after their first computation?

On comment 9, I can't see what kind of other error can occur when encoding a word. Either it's an element from the message space and encoding goes well, either it's not and you get an error as illustrated in the doctest of encoder. Or am I missing something here?

On comment 12, I actually deleted the whole block.

On comment 19, I changed the .. NOTE block to a .. WARNING block to make this remark even more visible on the doc of AbstractLinearCode.

On comment 23, it's more helpful indeed. Change done.

On comment 24, as c is not necessarily a codeword (if nocheck = True, you can provide anything you want, as long as it belongs to the ambient space), I did not change the INPUT line for c.

On comment 26, it's already the case. Maybe you meant "shouldn't include the word "the" ", but in that case I think the error message would seem a bit weird imho. I have no strong feelings on this though.

Best,

David

johanrosenkilde commented 9 years ago

comment:34

Cool!

On comment 6, I don't really get what you suggest. Do you propose to somehow register the unencoder_matrix and the information_set, for instance as some kind of class argument after their first computation?

I'm suggesting that _unencoder_matrix returns a tuple consisting of the matrix and the information set. Then unencode can be changed to use that matrix and information set, which is promised to work together. Right now, if code.information_set returns a different set next time it's called, then unencode will break since it uses the matrix for the first information set.

On comment 9, I can't see what kind of other error can occur when encoding a word. Either it's an element from the message space and encoding goes well, either it's not and you get an error as illustrated in the doctest of encoder. Or am I missing something here?

Encoder.encode is allowed to be a partial function over its message space (as explained in Encoder.message_space). This is not clear in the doc string of Encoder.encode. Presumably, a special type of error is supposed to be thrown if one tries to encode a value outside the allowed portion of the message space (say if I try to encode a polynomial of degree 2k in a [n,k] GRS code).

On comment 19, I changed the .. NOTE block to a .. WARNING block to make this remark even more visible on the doc of AbstractLinearCode.

Good idea.

On comment 24, as c is not necessarily a codeword (if nocheck = True, you can provide anything you want, as long as it belongs to the ambient space), I did not change the INPUT line for c.

Well, yes, the function won't explode if you give it c which is not a codeword and set nocheck = True, but that's not at all the intention of that function, and it's also not the intention of nocheck = True. The function is undefined - and thus useless - on this input! I think that the description of c should reflect the intended use of the function. Not the widest possible type for which the function won't explode.

On comment 26, it's already the case. Maybe you meant "shouldn't include the word "the" ", but in that case I think the error message would seem a bit weird imho. I have no strong feelings on this though.

Yes, I meant "shouldn't" :-) It's debatable, I guess, but having "the" doesn't seem to be convention in Sage:

sage: MatrixSpace(RR, 2, 3)
Full MatrixSpace of 2 by 3 dense matrices over Real Field with 53 bits of precision
sage: PolynomialRing(GF(2),'x')
Univariate Polynomial Ring in x over Finite Field of size 2 (using NTL)
sage: Hom(RR, ZZ)
Set of Homomorphisms from Real Field with 53 bits of precision to Integer Ring

Furthermore, the _repr_ doesn't uniquely specify "the" code, but usually gives a summary. Like for LinearCode or the soon-to-come GRS codes.

Best, Johan

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`d179435`	`Update to 6.9beta7`
`154cc6f`	`Integrated reviewer's comments`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 2115bf4 to 154cc6f

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:36

I'm suggesting that _unencoder_matrix returns a tuple consisting of the matrix and the information set. Then unencode can be changed to use that matrix and information set, which is promised to work together. Right now, if code.information_set returns a different set next time it's called, then unencode will break since it uses the matrix for the first information set.

Got it! Change done.

Encoder.encode is allowed to be a partial function over its message space (as explained in Encoder.message_space). This is not clear in the doc string of Encoder.encode. Presumably, a special type of error is supposed to be thrown if one tries to encode a value outside the allowed portion of the message space (say if I try to encode a polynomial of degree 2k in a [n,k] GRS code).

Ok. I added a .. NOTE block to explain this. And suggested to use EncodingError if one tries to encode a word which is not in the message space.

Well, yes, the function won't explode if you give it c which is not a codeword and set nocheck = True, but that's not at all the intention of that function, and it's also not the intention of nocheck = True. The function is undefined - and thus useless - on this input! I think that the description of c should reflect the intended use of the function. Not the widest possible type for which the function won't explode.

Agreed. I changed the doc.

For comment 26, as keeping the "the" is not convention in Sage and as I stick to the Sage's way of doing things, I deleted it.

Best,

David

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 154cc6f to d0435eb

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d0435eb Replaced import of linear_code module by a lazy_import

johanrosenkilde commented 9 years ago

comment:39

I presume that lazy_import was to avoid that sage.coding.encoder got imported, but I'm pretty sure it still does. At least, it's present in sys.modules and it shows up when doing sage.coding.<tab>.

This might be due to the line in __init__.py which will cause an immediate dereference of LinearCode, which then causes loading encoder.py too. But I didn't test that this is sufficient.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from d0435eb to 5965150

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5965150 Moved line related to dictionary of encoders

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:41

I moved the line which fills the dictionary of encoders for LinearCode from __init__.py to linear_code.py. It will be useful to do that to avoid immediate dereference of LinearCodein __init.py__ as explained in the message above. Note that for now, encoder module is still imported when Sage starts. I'll fix this (alongside with several other imports) in the follow-up ticket #19315

1861b8a9-77f0-4f35-8431-8514a75b40d1 commented 9 years ago

comment:42

Mmh...

I just notice that patchbot stops complaining about startup_modules... Thing is, encoder module is still imported when Sage starts, as explained in my previous message. Anyone knows what this startup_modules line means?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 5965150 to 43ec640