Closed 1861b8a9-77f0-4f35-8431-8514a75b40d1 closed 6 years ago
Commit: 645e329
Branch pushed to git repo; I updated commit sha1. New commits:
6292df9 | window_size and number_errors are now class arguments |
e72a5ab | Rewrote `_repr_` and `_latex_` methods, fixed a typo |
f2c074c | decode_to_code now iterates over the whole range of provided errors |
0e1391d | Added *args to Decoder/Encoder-related methods in AbstractLinearCode |
Author: David Lucas
I made some changes:
window_size
and number_errors
(previously p
and w
) are no longer parameters of decode_to_code
but class arguments to provide at construction time. number_errors
can be provided as a tuple/list, and in that case, the decoder will iterate over it when trying to decode a word.
I completely reworked the documentation, explained how the algorithm works and added a lot of doctests.
I completed input sanitization with some extra checks.
I fixed a stupid bug: only **kwargs
was considered on Decoder/Encoder-related methods from AbstractLinearCode
, like decoder()
. So, C.decoder('InformationSet', 2, 2)
was not working...
This is now open for review.
Branch pushed to git repo; I updated commit sha1. New commits:
8192769 | Updated to 7.1beta6 and fixed conflicts |
I updated this ticket to latest beta, fixed broken doctests, and did minor changes to the decoder (I refined its types and rewrote a line in the documentation). It's still open for review.
I updated this ticket to 7.3b2, and fixed a bug. Still open for review.
Best,
David
Here are my two cents:
itertools
instead of Subsets
information_set
but take them at random (otherwise you might get stuck a long time with bad choices)while True
loops instead of just for
?all in all, here is how I would do it (notations from the article you cite):
import itertools
def LeeBrickell(r, C, w, p):
G = C.generator_matrix()
n = C.length()
k = C.dimension()
Fstar = list(C.base_ring())[1:]
while True:
# step 1.
I = sample(range(n), k)
Gi = G.matrix_from_columns(I)
try:
Gi_inv = Gi.inverse()
except ZeroDivisionError:
continue
G = Gi_inv * G
#step 2.
y = r - vector([r[i] for i in I]) * G
g = G.rows()
#step 3.
for A in itertools.combinations(range(k), p):
for m in itertools.product(Fstar, repeat=p):
e = y - sum(m[i]*g[A[i]] for i in range(p))
if e.hamming_weight() == w:
return r - e
Reviewer: Yann Laigle-Chapuy
You should also check if the resulting word is in the code (it might not be the case if you try to decode beyond the minimal distance).
In case you want to optimize this a bit, you can replace the itertools
enumerations with sage.combinat.gray_codes
stuff (and if the field is GF(2)
get rid of the itertools.product
).
This is probably better kept for another ticket.
Changed branch from u/dlucas/information_set_decoding to u/jsrn/information_set_decoding
Changed author from David Lucas to David Lucas, Johan Rosenkilde
I've picked up this patch again: I've rebased it and polished it off but so far haven't changed the algorithm. This is not up for review yet!
Thanks, Yann, for your excellent suggestions which is the next thing I'll look into. I'm sorry you didn't get any response for so long.
I've changed the following things so far:
window_size
is optional (I have yet to
compute a sensible default value, though).AbstractLinearCode.encode/decode
when not
satisfying the decoder's constructor. This came up since the information set
decoder still has 1 mandatory argument.sage/coding
that came up: some bad printing in Golay
codes, removed old deprecated method LinearCode.decode
.My next step is to look at Yann's suggestion for improving the algorithm and it's efficiency.
New commits:
3b41dd5 | Rebase on 8.0.beta8. Roll back mods outside linear_code.py and thematic tutorial |
0f8bdb3 | Went over documentation and simplified calling convention |
57cf6b6 | Fixed Golay code Finite Field printing |
cf556ab | Fix doctests |
d021e1b | Remove `AbstractLinearCode.decode` which has been deprecated 18 months |
43c229c | Helpful error messages for encoder/decoder construction with args |
25b2b93 | decoder types: for new decoder, remove "unique", and fix some descriptions |
No problem for the delay, I'm far from following this closely :)
A side remark though: _registered_decoders
is a dict
and decoders_available
returns its keys, so the ordering is not reliable for doctests. I would advocate always returning a sorted list as it's not a time critical method.
Replying to @sagetrac-ylchapuy:
No problem for the delay, I'm far from following this closely :)
A side remark though:
_registered_decoders
is adict
anddecoders_available
returns its keys, so the ordering is not reliable for doctests. I would advocate always returning a sorted list as it's not a time critical method.
Good point - I'll fix this while we're at it.
Branch pushed to git repo; I updated commit sha1. New commits:
7551c47 | Sort all doc-tests for decoders/encoders_available |
Hi Johan,
I had a quick look at this ticket. For the moment, I didn't review the algorithm but I have a few remarks on the rest of the code.
While running doctests, I obtained 3 errors:
in golay_code.py, line 169, a bracket misses in the doctest
in decoder.py, lines 105 and 113. That's because the type unique
has been removed from the decoders' types of LinearCodeSyndromeDecoder
and LinearCodeNearestNeighborDecoder
(see in linear_code.py) but not from the doctest.
In fact, you seem to have removed the type unique
from every decoder_type
; is this because you consider that a decoder is a unique decoder as long as it isn't claimed not to be like that?
Besides, I have a more stuctural remak. Starting from Prange algorithm, there are many variants of the information set decoding algorithm (Lee-Brickell, birthday decoding, Stern-Dumer, MMT, BJMM), the last ones being quite recent. The name InformationSetDecoder
implicitly assumes we're planning to implement only one of them (Lee-Brickell). The question is: if someone wants to implement another variant, what should he do? I think there are two solutions:
the class InformationSetDecoder
will contain all the implemented variants of Prange algorithm (that is, for the moment only Lee-Brickell). If another algorithm apears in Sage, then it will be callable through an optional argument of the constructor, e.g. __init__(algo="LeeBrickell", **kwargs)
.
We define a class per algorithm, and InformationSetDecoder
is a pointer to LeeBrickellISD
.
In my humble opinion the second one is better (it is what we did for the various decoders of GRS codes), but I do not have time to rework the code during the following weeks.
Julien
Replying to @johanrosenkilde:
- Change the doc-tests to use a specific code (Golay) instead of random ones.
It would be good to continue to test also non-binary codes (because we can!)
- Fixed various errors in
sage/coding
that came up: some bad printing in Golay codes, removed old deprecated methodLinearCode.decode
.
I understand it's easier, but isn't it best practice to have self contained tickets?
Hi Julien,
Thanks for the feedback.
Replying to @jlavauzelle:
While running doctests, I obtained 3 errors:
Thanks, stupid mistakes. I'll fix it presently.
In fact, you seem to have removed the type
unique
from everydecoder_type
; is this because you consider that a decoder is a unique decoder as long as it isn't claimed not to be like that?
Yes. This finally became clear in #20001 when the types were documented better, though it was forgotten in #20001 to actually go over all decoders again and fix their types.
Besides, I have a more stuctural remak.
That's a good point. I would leave it to posterity to decide this. My own opinion is that a single class with an algorithm
flag is sufficient, since all the decoders have exactly the same overall behaviour (same types) and only differ in speed, in some sense. Several of the decoders for GRS codes are of very different type.
If someone comes along and implements, say Stern-Dumer, the choice can easily be made at that point.
Replying to ylchapuy:
Change the doc-tests to use a specific code (Golay) instead of random ones.
It would be good to continue to test also non-binary codes (because we can!)
Good point!
Fixed various errors in sage/coding that came up: some bad printing in Golay codes, removed old deprecated method LinearCode.decode.
I understand it's easier, but isn't it best practice to have self contained tickets?
Yes it's best practice. My experience with the trac ticket-and-review system, however, is that if best practice is followed completely, then trivial mistakes will never get fixed. I'm tired of coming across small mistakes here and there while editing codes and not being able to fix them, and then deal with ticket dependencies etc. My impression is that many of the very active developers on SageMath do the same.
Removing the deprecated function is perhaps debatable, though.
Regarding the algorithm, it really should be corrected. Try e.g.:
sage: C = codes.random_linear_code(GF(3), 24, 10)
sage: c = C.random_element()
sage: Chan = channels.StaticErrorRateChannel(C.ambient_space(), 4)
sage: y = Chan(c)
sage: %time LeeBrickell(y, C, 4, 2) == c
CPU times: user 16 ms, sys: 4 ms, total: 20 ms
Wall time: 20.5 ms
True
sage: D = C.decoder("InformationSet", 4)
sage: %time D.decode_to_code(y)
<... still waiting ... >
Yep, the algorithm is broken which I've now seen trying it on a GF(3)
Golay code. I've pushed the example, but this is obviously still needs work.
As I mentioned earlier, going over the algorithm was next order of business anyway.
Branch pushed to git repo; I updated commit sha1. New commits:
a81785a | Use ylchapuy's implementation of Lee-Brickell |
I've replaced the algorithm with Yann's implementation from Comment 10, with only trivial modifications. This one works on the GF(3)
doctest. I still want to do more testing and timing, so this is not up for review yet.
Note that I opened #23244 because I plan to implement Stern-Dumer.
That's awesome!
I'm currently working on calibrating the window parameter - I find this an integral part of the Lee--Brickell algorithm because it has a huge effect on the average running time (and a user would have no idea how to set it). Unfortunately, it's a bit hit and miss when I have time to work on this, but I'll try to get it wrapped up.
Replying to @johanrosenkilde:
That's awesome!
I'm currently working on calibrating the window parameter - I find this an integral part of the Lee--Brickell algorithm because it has a huge effect on the average running time (and a user would have no idea how to set it). Unfortunately, it's a bit hit and miss when I have time to work on this, but I'll try to get it wrapped up.
First, I don't understand why you changed p
to win
. There is no notion of window here to me. Every paper I saw call this parameter p
and I think it would be better to stick to it, as w
is used for the weight of the expected error.
I'll use the notation n
,k
for the length and dimension, w
for the weight of the error and p
for the parameter of the algorithm.
Regarding the choice of this p
parameter, first the analysis is usually made for constant error weight correction (when tau
is a singleton) because it's easier. I'll do that too. Moreover analysis is usually done with a fixed p
(I mean no loop as in you line 5303), see e.g.
http://fms2.cerimes.fr/vod/media/canalu/documents/fuscia/3.5.lee.and.brickell.algorithm_32885/c015im.w3.s5.v3.pdf for the binary case. But I think it's worth doing the loop.
The expected cost C
is C_I / P_I
where C_I
is the cost of one iteration (from line 5289) and P_I
the probability of success of this iteration (the iterations are independant because we choose I
at random, so P_I
is a constant).
C_Iis
G + x * Vwith
Gthe cost of step 1 and 2 (mostly a Gaussian elimination),
xthe number of error vector
ewe compute and
Vthe cost to compute each of them (this depends on the associated
p` in the current implementation but could be made constant if we later use e.g. gray codes to compute the sum).
Each vector e
we compute increases P_I
by a small amount E
which depends on the current p
, (the bigger the p
, the smaller the gain).
Overall the cost function is piecewise defined:
C(x) = (G + x*V) / (P(p) + x*E(p))
with P(p)
the success probability if we stop before p
[i.e P(p) = sum_{i=0}^{p-1} E(i) * binomial(k,i)*(q-1)**i
]
The sign of the derivative depends only on G*E(p) - V*P(p)
.
G
and V
are implementation dependant but you can estimate them, E
and P
can be computed.
You could do this computation when you construct the decoder to choose p
(it will be fast because p
will never get bigger than 3 I think).
I did my best but I'm not certain my explanations are crystal clear! Does this makes sense to you?
Replying to @sagetrac-ylchapuy:
First, I don't understand why you changed
p
towin
. There is no notion of window here to me. Every paper I saw call this parameterp
and I think it would be better to stick to it, asw
is used for the weight of the expected error.
I think that a name is more evocative than a letter, in spite of tradition in mathematics text. "Window" was the word David used in the original implementation, and I couldn't come up with a substantially better word. It also has a nice, clear 3-letter abbreviation. Using w
for error weight is not more traditional than e.g. e
, t
or tau
in the greater decoding literature. I changed it since it visually collides with win
, so I used tau
in my implementation. I believe it crops up elsewhere in sage/coding
since I commonly use this this.
I'll use the notation
n
,k
for the length and dimension,w
for the weight of the error andp
for the parameter of the algorithm. ...
Thanks. I already implemented all of this a few days ago, exactly using the equations you write. Where I left it then was at the estimates of E
and P
which have to be devilishly precise: I've got E
down well, but the code simplification I did when estimating P
turned out to be too coarse (I believed it depended mostly only on the cost of scaling and summing vectors, but apparently the hamming_weight
and loop structures add something). Testing this takes time because I need to run the decoder many times on a non-trivial code to be sure things work. I'll post my test code here as well, which might be useful for ou in #23244.
I'll look at it again now and push what I have in any case.
Replying to @johanrosenkilde:
Replying to @sagetrac-ylchapuy:
First, I don't understand why you changed
p
towin
. There is no notion of window here to me. Every paper I saw call this parameterp
and I think it would be better to stick to it, asw
is used for the weight of the expected error.I think that a name is more evocative than a letter, in spite of tradition in mathematics text.
as a parameter yes
"Window" was the word David used in the original implementation, and I couldn't come up with a substantially better word.
Can you explain me what window this would be? I really don't get it.
I propose partial_weight
, this is the number of error positions we expect in our information set (and is abbreviated as p
).
It also has a nice, clear 3-letter abbreviation. Using
w
for error weight is not more traditional than e.g.e
,t
ortau
in the greater decoding literature. I changed it since it visually collides withwin
, so I usedtau
in my implementation. I believe it crops up elsewhere insage/coding
since I commonly use this this.
You cite a paper in the doctest. I always find it nicer to use the same notations in the implementation as in the paper, it makes life easier for anyone trying to understand what your code does.
But in the end I guess it's your call, I won't fight about this.
Replying to @johanrosenkilde:
Replying to @sagetrac-ylchapuy:
I'll use the notation
n
,k
for the length and dimension,w
for the weight of the error andp
for the parameter of the algorithm. ...Thanks. I already implemented all of this a few days ago, exactly using the equations you write. Where I left it then was at the estimates of
E
andP
which have to be devilishly precise: I've gotE
down well, but the code simplification I did when estimatingP
turned out to be too coarse (I believed it depended mostly only on the cost of scaling and summing vectors, but apparently thehamming_weight
and loop structures add something).
E(p)
should be binomial(n-k, w-p) / (binomial(n,w) * (q-1)**p)
and I wrote the formula for P(p)
.
I guess you mean estimating G
and V
. They don't need to be that precise. If they're a bit wrong, the final cost estimate will only be a bit wrong, nothing catastrophic should happen.
Testing this takes time because I need to run the decoder many times on a non-trivial code to be sure things work. I'll post my test code here as well, which might be useful for ou in #23244.
I don't want to push you, I only like to write down my thoughts, you may ignore them. I know this takes time and we all have other things to do!
Replying to @sagetrac-ylchapuy:
I propose
partial_weight
, this is the number of error positions we expect in our information set (and is abbreviated asp
).You cite a paper in the doctest. I always find it nicer to use the same notations in the implementation as in the paper, it makes life easier for anyone trying to understand what your code does.
That's a valid reason. The cited paper is not the original contribution and the notation looked ad hoc to me, so I didn't heed it in this instance. But I'm not well-versed in ISD literature, and if you say the parameter is often p
that's a stronger argument. Did you come up with the partial_weight
or did you see that somewhere? It sounds bland to me, but it does share the p
.
Replying to @johanrosenkilde:
Replying to @sagetrac-ylchapuy:
I propose
partial_weight
, this is the number of error positions we expect in our information set (and is abbreviated asp
).You cite a paper in the doctest. I always find it nicer to use the same notations in the implementation as in the paper, it makes life easier for anyone trying to understand what your code does.
That's a valid reason. The cited paper is not the original contribution and the notation looked ad hoc to me, so I didn't heed it in this instance. But I'm not well-versed in ISD literature, and if you say the parameter is often
p
that's a stronger argument.
It's not used in the original publication (there it's j
) but it's used at least since Stern's paper in 1989, in all the papers proposing improvements (and so is w
for the expected weight of the error).
Did you come up with the
partial_weight
or did you see that somewhere? It sounds bland to me, but it does share thep
.
I just came up with this, but I agree it's not particularly good. I don't know any wordy definition of this parameter.
I've pushed an attempt at writing the calibration of the search size parameter
(which I call it now, abbreviated p
). I monkey'd around for a very long while
thinking it didn't work, but I must have confused myself terribly, because when
I tried to clean everything up in order to write this report on my progress, it
suddenly doesn't seem too bad. I believe one gotcha that confounded me was the
inclusion of a rho
factor: the fraction of size-k
-positions that are
information sets.
Anyway, here are some timings:
[24, 12, 8] Extended Golay code over GF(2)
Estimated Actual
p=0 0.00294093602195062 0.00370841026306
p=1 0.000733640004873450 0.00105549812317
p=2 0.000779351222057502 0.000777833461761
p=3 0.00202210000281411 0.00085108757019
[12, 6, 6] Extended Golay code over GF(3)
Estimated Actual
p=0 0.00107591199443414 0.00152409553528
p=1 0.000475508233346987 0.000670518875122
p=2 0.000837919998575671 0.000634233951569
[255, 231] BCH Code over GF(2) with designed distance 7
Estimated Actual
p=0 8.27393653065493 10.0903596711
p=1 0.303782262811033 0.38420484066
p=2 0.613038331151469 0.627587425709
p=3 13.5825719377659 6.28117632627
[255, 139] BCH Code over GF(2) with designed distance 31
Estimated Actual
[255, 47] BCH Code over GF(2) with designed distance 81
Estimated Actual
p=0 7.08824734720053 6.80084561825
p=1 0.730343868369564 0.871155004501
p=2 0.776726300346984 0.963895459175
p=3 3.16273106617629 4.46240173101
p=4 14.6264398618229 16.1132382083
[80, 56] BCH Code over GF(3) with designed distance 9
Estimated Actual
p=0 0.652945537606786 0.400583426952
p=1 0.104725869935881 0.0847134900093
p=2 0.651413542048857 0.519779400826
p=3 11.2044949381990 5.3165213418
[80, 15] BCH Code over GF(3) with designed distance 41
Estimated Actual
p=0 0.0524764488533255 0.0787803530693
p=1 0.0235392606723455 0.0252663660049
p=2 0.0786084509854239 0.0758757901192
p=3 0.340890763500404 0.248177850246
p=4 1.46166863051819 0.734001851082
p=5 5.69338143224234 1.27127592802
p=6 19.6762218536446 2.85639382839
[124, 94] BCH Code over GF(5) with designed distance 13
Estimated Actual
p=0 51.8959970988144 (25.518362093 last time, skipped here)
p=1 8.16363820890485 13.5431237435
Here follows some of my test code:
def test_decoder(D, reps=100):
r"""Return the time it takes to run the decoder on average, with maximal
number of errors."""
C = D.code()
tau = D.decoding_radius()
Chan = channels.StaticErrorRateChannel(C.ambient_space(), tau)
tsum = 0
for i in range(reps):
c = C.random_element()
r = Chan(c)
before = time.clock()
cout = D.decode_to_code(r)
after = time.clock()
assert c == cout , "decoding error"
tsum += after - before
return tsum/reps
from sage.coding.linear_code import LinearCodeInformationSetDecoder
test_codes = [
( 3, codes.GolayCode(GF(2))),
( 2, codes.GolayCode(GF(3))),
( 3, codes.BCHCode(GF(2), 2^8-1, 7)),
(15, codes.BCHCode(GF(2), 2^8-1, 31)),
(40, codes.BCHCode(GF(2), 2^8-1, 81)),
( 4, codes.BCHCode(GF(3), 3^4-1, 9)),
(20, codes.BCHCode(GF(3), 3^4-1, 41)),
( 6, codes.BCHCode(GF(5), 5^3-1, 13))
]
for (tau, C) in test_codes:
print C
D = LinearCodeInformationSetDecoder(C, tau)
estimates = D._calibrate_search_size()
print "\t\tEstimated\t\tActual"
for p in range(0,tau+1):
if estimates[p] < 30:
D._search_size = p
avg = test_decoder(D, reps=5)
else:
avg = "skipped"
print "\tp=%s\t%s\t%s" % (p, estimates[p], avg)
print
def test_iters(D):
r"""Return the avg number of iterations taken by the decoder, including
those where a non-information set is picked.
This requires debug code inserted to count `self._iters` in the decoder.
"""
N = 101
iters = []
for i in range(N):
test_decoder(D, reps=1)
iters.append(D._iters)
print median(iters) , 1. * sum(iters) / N
def infoset_density(C, reps=1000):
r"""Estimate density of information sets"""
s = 0
G = C.generator_matrix()
n, k = C.length(), C.dimension()
for N in range(reps):
I = sample(range(n), k)
Gi = G.matrix_from_columns(I)
try:
Gi_inv = Gi.inverse()
s += 1
except ZeroDivisionError:
pass
return 1. * s / (N+1)
for _,C in test_codes:
print "%s\t%s" % (C, infoset_density(C))
New commits:
d6f69b8 | Rename number_errors() method to decoding_interval() and decoding_radius() method |
a074919 | Calibrate window parameter, take 1 |
7ab7277 | Calibrate window parameter, take 2 |
476369c | Rename window to search_size, win -> p, w -> p or pi |
Great, I'll take a closer loop later. Here are some preliminary remarks:
you shouldn't worry about rho
, consider as an iteration only those with a valid information set. It's already what you measure with time_information_set_steps
, though I would take the mean and not the median here.
beware, _search_size
is initialized to None
which leads to trouble. You should just leave it uninitialized.
I updated the timings after a longer run with more iterations.
Replying to @sagetrac-ylchapuy:
Great, I'll take a closer loop later. Here are some preliminary remarks:
- you shouldn't worry about
rho
, consider as an iteration only those with a valid information set. It's already what you measure withtime_information_set_steps
, though I would take the mean and not the median here.
I disagree: it changes the balance between the inversion step (done always) and the search step (done a rho
fraction of the time). Perhaps it's not paramount, but I've already implemented it, and it makes debugging nice since the number of iterations matches extremely well.
- beware,
_search_size
is initialized toNone
which leads to trouble. You should just leave it uninitialized.
Thanks, that's a bug: I put the call to calibration in search_size()
which is only called if _search_size
is unset.
This ticket proposes an implementation of Lee-Brickell algorithm refinement for the information set decoding.
It depends on #19653 for the reimplementation of
C.random_element
.Depends on #19653
CC: @jlavauzelle
Component: coding theory
Author: David Lucas, Johan Rosenkilde, Yann Laigle-Chapuy
Branch/Commit:
eb2efb3
Reviewer: Yann Laigle-Chapuy
Issue created by migration from https://trac.sagemath.org/ticket/20138