j-towns / craystack

Compression tools for machine learning researchers
Other
82 stars 8 forks source link

Rare bug in non-dyadic push and pop #14

Closed j-towns closed 1 year ago

j-towns commented 2 years ago

We were a bit careless in our implementation of the non-dyadic push and pop functions, and there's a bug, which is very rare at 64 bit precision. A reproduction, with explanation:

from craystack import rans

import numpy as np

# The renormalization in rans.push_with_finer_prec will keep moving
# bits into the tail until the head is less than
# (rans_l // 3) << r_t      == 3074457342754947072

# However, the true upper bound on the head after rans.pop_with_finer_prec
# is less strict, it's equal to
# ((1 << r_s) - 1) // 3 + 1 == 3074457345618258603

# We can break the inverse property by constructing a message and
# pop such that the result is between the two values above. That means
# that push thinks it needs to undo a renorm when actually no renorm
# took place in the pop:

# Set s to the largest possible value
m = np.uint64([(1 << 63) - 1]), ()

# Use N = 3 as the denominator
N = 3

u, popfun = rans.pop_with_finer_prec(m, N)
m_ = popfun(u, 1)

# u  == array([1], dtype=uint64)
# m_ == (array([3074457345618258602], dtype=uint64), ())
# s is between the two values above, and that causes a problem...

m__ = rans.push_with_finer_prec(m_, u, 1, N)

assert rans.message_equal(m, m__), 'm = ' + str(m) + '\n m__ = ' + str(m__)

This isn't too difficult to fix. We need to alter what we call 'renormalization', to ensure that before we perform a pop the message head h is in an interval [a * N, a * b * N) for some natural numbers a, b. For us it is sensible to use b = (1 << 32). The value of a should be as large as possible, such that we don't get overflow, that is, such that a * b * N <= (1 << 64). If we're fixing b = (1 << 32) then that condition on a becomes a <= (1 << 32) / N. I think setting a = (1 << 32) // N is the sensible thing to do.

We also need to 'undo' the new renormalization step at the beginning of pop, with a corresponding step at the end of push.

Hopefully I get round to fixing this as soon as possible.

Credit to Fabian's blog post https://fgiesen.wordpress.com/2014/02/02/rans-notes/ for helping me understand this.