bugs - Githubissues

Hi, thanks for asking!

DSGD is the same as DGD_tracking DSGD samples with replacement using np.random.randint(), so even if you set the batch size to sample size, DSGD should not be the same as DGD_tracking. Could you please provide more info about your experiment settings?
DGD_tracking should be faster than DSGD DGD_tracking converges deterministically, but DSGD converges in expectation. So they can't be compared directly. One counter example can be: when the dimension is too high and sample size too big, computing the full gradient may be too expensive, like training ResNet-50 on ImageNet.

liboyue / Network-Distributed-Algorithm