Closed amitport closed 4 years ago
Hi,
Thanks for pointing out this issue. Yes, we forgot to create a new memory class for EFSignSGDCompressor
. Now if you want to use EFSignSGDCompressor
with its memory, you may call GRACE like this:
# Horovod TensorFlow
import horovod.tensorflow as hvd
from grace_dl.tensorflow.communicator.allgather import Allgather
from grace_dl.tensorflow.compressor.efsignsgd import EFSignSGDCompressor
from grace_dl.tensorflow.memory.efsignsgd import EFSignSGDMemory
world_size = hvd.size()
grc = Allgather(EFSignSGDCompressor(lr=0.1), EFSignSGDMemory(lr=0.1), world_size)
# or with helper
from grace_dl.tensorflow.helper import grace_from_params
params={'compressor': 'efsignsgd', 'memory': 'efsignsgd', 'communicator': 'allgather'}
grc = grace_from_params(params)
opt = hvd.DistributedOptimizer(opt, grace=grc)
Hi, Maybe I'm misunderstanding something but it seems that
EFSignSGDCompressor
(parametercompressor
set toefsignsgd
) never uses its residual memory (error feedback).(the tensorflow version has
compensate
andupdate
in the compressor but they are never called)Thank you