sands-lab / grace

GRACE - GRAdient ComprEssion for distributed deep learning
https://sands.kaust.edu.sa/project/grace/
BSD 2-Clause "Simplified" License
133 stars 45 forks source link

EFSignSGDCompressor does not seem to use memory #10

Closed amitport closed 4 years ago

amitport commented 4 years ago

Hi, Maybe I'm misunderstanding something but it seems that EFSignSGDCompressor (parameter compressor set to efsignsgd) never uses its residual memory (error feedback).

(the tensorflow version has compensate and update in the compressor but they are never called)

Thank you

hangxu0304 commented 4 years ago

Hi,

Thanks for pointing out this issue. Yes, we forgot to create a new memory class for EFSignSGDCompressor. Now if you want to use EFSignSGDCompressor with its memory, you may call GRACE like this:

# Horovod TensorFlow
import horovod.tensorflow as hvd
from grace_dl.tensorflow.communicator.allgather import Allgather
from grace_dl.tensorflow.compressor.efsignsgd import EFSignSGDCompressor
from grace_dl.tensorflow.memory.efsignsgd import EFSignSGDMemory

world_size = hvd.size()
grc = Allgather(EFSignSGDCompressor(lr=0.1), EFSignSGDMemory(lr=0.1), world_size)

# or with helper
from grace_dl.tensorflow.helper import grace_from_params
params={'compressor': 'efsignsgd', 'memory': 'efsignsgd', 'communicator': 'allgather'}
grc = grace_from_params(params)

opt = hvd.DistributedOptimizer(opt, grace=grc)