Closed dsevero closed 3 years ago
Very nice work. I'm actually wondering if we shouldn't go a bit further with the API change, and allow push and pop to be any inverse pair, i.e. instead of
push: (ans_state, symbol, *context) |-> (ans_state, *context)
pop: (ans_state, *context) |-> (ans_state, symbol, *context)
we go for
push: before |-> after
pop: after |-> before
There's a trade-off here — instead of having to unpack results, as in
message, = codec.push(message, symbol)
the 'inverses' approach would require packing inputs, as in
message = codec.push((message, symbol))
because each push/pop must have exactly one argument.
An advantage of the 'inverses' approach is that the codec combinators (things like serial
and parallel
) might be simpler, and it might reveal some more general forms. For example, what we now call repeat
, might be replaced by an 'invertible reduce':
from functools import reduce
def ireduce(codec, length):
# Assumes that
# codec.push: (a, b) |-> a
# codec.pop: a |-> (a, b)
# and returns a codec for lists containing type b elements:
# push: (a, bs) |-> a
# pop: a |-> (a, bs)
def push(a, bs):
return reduce(codec.push, bs, a)
def pop(a):
bs = []
for _ in range(length):
a, b = codec.pop(a)
bs.append(b)
del b
return bs
return codec(push, pop)
This is a change we could make in a separate pr/series of prs.
Here's another sketch: https://gist.github.com/j-towns/d8deee8d2bbfa4bb5ab93bc2573ee81d
Maybe that could be a lower layer, and the codecs could be implemented on top? I'm worried that it might be a bit much for the regular user that just wants to do compression.
Adaptive codecs
As mentioned in the title, this would allow adaptive codecs to be implemented in Craystack.
Signatures of push and pop are extended to allow for a
*context
variable.Note that, since context is passed via unpacking (i.e. *context), then it is essentially optional. Therefore, previous codecs are still compliant with this new signature.
However, the return of
push
will be at least(ans_state,)
, so function calls have been adapted accordingly. This means that instead ofwe now have
Multiset codec
The multiset codec from https://github.com/facebookresearch/multiset-compression was ported to
craystack/codecs/multiset.py
Extra example
An extra example, showing how to use the multiset codec for BMNIST with BB-ANS, was added to
craystack/examples
.