stsievert / LeanSGD

Wide Residual Networks (WideResNets) in PyTorch
2 stars 2 forks source link

Fix the number of bytes communicated for all coding schemes #4

Open stsievert opened 6 years ago

stsievert commented 6 years ago

For all coding schemes, we have a free parameter. That is, the coding for QSGD is something like

code(g)[i] = sign(g[i]) * x[i] * norm(g)

where x[i] is a Bernoulli with prob abs(g[i]) / norm(g).

We will pass some parameter in to train.py (maybe --grad_frac?) and then set

num_elem = np.abs(g).sum() / norm(g)
c = grad_frac / num_elem  # solves grad_frac = c * exected_elelements

Then, QSGD code is

code(g)[i] = sign(g[i]) * x[i] * norm(g) / c

where x[i] is Bernoulli with prob abs(g[i]) * c / norm(g) which happens because c cancels out the terms because of it's definition.

stsievert commented 6 years ago

@hwang595

stsievert commented 6 years ago

I think this would be an interesting graph, even if it's only over a couple iterations with a bar graph.