How should I use GRACE if I change the compression strategy during training？

sands-lab / grace

GRACE - GRAdient ComprEssion for distributed deep learning

https://sands.kaust.edu.sa/project/grace/

BSD 2-Clause "Simplified" License

133 stars 45 forks source link

How should I use GRACE if I change the compression strategy during training？ #4

Closed KevvinHoo closed 4 years ago

KevvinHoo commented 4 years ago

I will change the compression during the training process. In this case, how should I use GRACE？ The pseudo-code is shown below：

compression = A()

optimizer = hvd.DistributedOptimizer(optimizer,
                                         compression,
                                         named_parameters=model.named_parameters())                                                               

for epoch in range(0,100):
      if epoch > 50:
             compression = B()
             How to apply that compression above to the DistributedOptimizer?

      train()
      test()

      .....

Anyone could help me to solve this problem? Appreciate for your help!!!

mcanini commented 4 years ago

I would probably write a Proxy Compressor, which internally switches among different concrete compressors.