microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.66k stars 4.14k forks source link

Problem when training CycleGAN #582

Open carefree0910 opened 3 years ago

carefree0910 commented 3 years ago

Sometimes (e.g. CycleGAN) we need to optimize two (or more) models' parameters together because it will be more efficient (e.g. when optimizing cycle loss we definitely don't want to use retain_graph=True).

I was just wondering whether this is the right way to initialize optimizer which aims to optimize both net_a2b's & net_b2a's parameters:

original_optimizer = ...
net_a2b, optimizer, _, _ = deepspeed.initialize(args, net_a2b, original_optimizer)
net_b2a, _, _, _ = deepspeed.initialize(args, net_b2a, original_optimizer)

Any help would be very grateful, thanks in advance!

carefree0910 commented 3 years ago

I'm also quite confused on how to do the correct backward pass in this case. For the CycleGAN example, we calculate the cycle_loss depend on net_a2b & net_b2a:

cycle_loss = loss_fn(net_a2b, net_b2a)

and we want to do one backward pass instead of two, and want to avoid retain_graph=True. However, if we stick to engine.backward in deepspeed, it seems impossible to do so.