Closed XIe-Yibin closed 3 years ago
Hi, thanks for asking!
DSGD is the same as DGD_tracking
DSGD samples with replacement using np.random.randint()
, so even if you set the batch size
to sample size, DSGD should not be the same as DGD_tracking. Could you please provide more info about your experiment settings?
DGD_tracking should be faster than DSGD DGD_tracking converges deterministically, but DSGD converges in expectation. So they can't be compared directly. One counter example can be: when the dimension is too high and sample size too big, computing the full gradient may be too expensive, like training ResNet-50 on ImageNet.
I found that actually setting the batch_size of the DSGD algorithm to all samples, the convergence results are the same as DGD_tracking, but I don't know what is causing this. In theory, DGD_tracking should be faster than DSGD?