liboyue / Network-Distributed-Algorithm

Experiments for distributed optimization algorithms
69 stars 24 forks source link

bugs #1

Closed XIe-Yibin closed 3 years ago

XIe-Yibin commented 3 years ago

I found that actually setting the batch_size of the DSGD algorithm to all samples, the convergence results are the same as DGD_tracking, but I don't know what is causing this. In theory, DGD_tracking should be faster than DSGD?

liboyue commented 3 years ago

Hi, thanks for asking!

  1. DSGD is the same as DGD_tracking DSGD samples with replacement using np.random.randint(), so even if you set the batch size to sample size, DSGD should not be the same as DGD_tracking. Could you please provide more info about your experiment settings?

  2. DGD_tracking should be faster than DSGD DGD_tracking converges deterministically, but DSGD converges in expectation. So they can't be compared directly. One counter example can be: when the dimension is too high and sample size too big, computing the full gradient may be too expensive, like training ResNet-50 on ImageNet.