Open stsievert opened 6 years ago
For all coding schemes, we have a free parameter. That is, the coding for QSGD is something like
code(g)[i] = sign(g[i]) * x[i] * norm(g)
where x[i] is a Bernoulli with prob abs(g[i]) / norm(g).
x[i]
abs(g[i]) / norm(g)
We will pass some parameter in to train.py (maybe --grad_frac?) and then set
--grad_frac
num_elem = np.abs(g).sum() / norm(g) c = grad_frac / num_elem # solves grad_frac = c * exected_elelements
Then, QSGD code is
code(g)[i] = sign(g[i]) * x[i] * norm(g) / c
where x[i] is Bernoulli with prob abs(g[i]) * c / norm(g) which happens because c cancels out the terms because of it's definition.
abs(g[i]) * c / norm(g)
c
@hwang595
I think this would be an interesting graph, even if it's only over a couple iterations with a bar graph.
For all coding schemes, we have a free parameter. That is, the coding for QSGD is something like
where
x[i]
is a Bernoulli with probabs(g[i]) / norm(g)
.We will pass some parameter in to train.py (maybe
--grad_frac
?) and then setThen, QSGD code is
where
x[i]
is Bernoulli with probabs(g[i]) * c / norm(g)
which happens becausec
cancels out the terms because of it's definition.