sands-lab / grace

GRACE - GRAdient ComprEssion for distributed deep learning
https://sands.kaust.edu.sa/project/grace/
BSD 2-Clause "Simplified" License
133 stars 45 forks source link

How should I use GRACE if I change the compression strategy during training? #4

Closed KevvinHoo closed 4 years ago

KevvinHoo commented 4 years ago

I will change the compression during the training process. In this case, how should I use GRACE? The pseudo-code is shown below:

compression = A()

optimizer = hvd.DistributedOptimizer(optimizer,
                                         compression,
                                         named_parameters=model.named_parameters())                                                               

for epoch in range(0,100):
      if epoch > 50:
             compression = B()
             How to apply that compression above to the DistributedOptimizer?

      train()
      test()

      .....

Anyone could help me to solve this problem? Appreciate for your help!!!

mcanini commented 4 years ago

I would probably write a Proxy Compressor, which internally switches among different concrete compressors.