mshahbazi72 / transitional-cGAN

Official PyTorch Implementation of transitional-cGAN
Other
46 stars 7 forks source link

Approach is ineffective, at least with certain datasets/domains #2

Open Kaoru8 opened 2 years ago

Kaoru8 commented 2 years ago

In my experience with this approach, it doesn't help prevent mode collapse (at least in certain scenarios like mine), it simply delays it and prolongs the training process with the gradual transition from unconditional to conditional regimes.

From my testing, it does seem to keep training more stable and avoid mode collapse before and during the transition period. Soon after the transition is complete however, outputs start deteriorating, and the model still ends up in a total or near-total mode collapse. I suspect the nature of the problem is something conceptually similar to the vanishing gradient problem. As the transition moves closer to the conditional regime, it is by definition farther from the global/unconditional data, and benefits less from it as training goes on until the transition is complete and any benefit stops entirely.

Building on your basic idea of training unconditionally at first before introducing labels, I found a way simpler and seemingly more stable approach that doesn't require modifying the vanilla StyleGAN architecture(s), or a transition period. Simply, it's training an unconditional model, then using one-shot weight transfer to instantly "convert" the unconditional model to a conditional one, and resume training that. Steps:

1) Create conditional dataset 2) Start training unconditional model with the same configuration you would the conditional one, but simply omitting the --cond=1 flag 3) Train until the model has "learned enough" from the unconditional regime. I haven't done extensive testing so I don't know what the "optimal" point in training this would be, but a crude yet effective method would be to stop once FID stops improving. It does have the benefit of avoiding new hyperparameters, and should be effective regardless of domain/dataset 4) Start training a randomly initialized conditional model with the same command used in step 2, but with --cond=1 this time. Stop training as soon as the initial model .pkl file is generated and written to disk 5) Load both pickles (trained unconditional and untrained conditional), copy all layer weights with matching names and shapes from unconditional to conditional model, save the resulting "converted" pickle 6) Resume conditional training with --cond=1 and --resume pointing to your converted conditional pickle from the pervious step

Since the transition is instantaneous and conditioning starts with the model having access to 100% of the high-level data learned during unconditional training, it avoids the data loss that seems to occur during a gradual transition. I can't say that it entirely fixes the mode collapse problem yet, since I'm still in the middle of training the model and it may still collapse to one degree or another before convergence, but I can definitely say it's been significantly more stable. There's no sign of mode collapse in any of the classes 3000 kImg after the transition, whereas the gradual transition approach experienced near-total mode collapse across all classes after only about 500 kImg after the transition ended, on the same dataset.

Hoping the info helps someone. This was one of the very few implementations/papers that deals with stability problems during conditional StyleGAN training, and problems with complex domains in general, but it didn't help me, so I'm sharing something that did. Also somewhat relevant to you might be "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" (https://arxiv.org/abs/2202.00273). Their re-introduction of progressive growing, and disabling things like style mixing and path length regularization, might be useful in training complex conditional models. However, their model is way more complex that vanilla StyleGAN, depends on external models, and still has issues.

I'm currently playing around with the idea of progressively growing vanilla StyleGAN architectures via the weight-transfer method, switching to a conditional regimes at different stages of model scaling. So guess we'll see in time, might be an effective method of training on complex conditional domains, while avoiding mode collapse and architecture changes.

49xxy commented 2 years ago

根据我对这种方法的经验,它无助于防止模式崩溃(至少在像我这样的某些场景中),它只是延迟它并随着从无条件机制到有条件机制的逐渐过渡而延长训练过程。

从我的测试来看,它似乎确实让训练更加稳定,并避免了在过渡期之前和期间的模式崩溃。然而,在转换完成后不久,输出开始恶化,模型仍以完全或接近完全的模式崩溃告终。我怀疑问题的本质在概念上类似于消失梯度问题。随着过渡更接近条件机制,根据定义,它离全局/无条件数据越来越远,并且随着训练的继续进行,直到过渡完成并且任何好处都完全停止,从中受益也越来越少。

基于你在引入标签之前无条件训练的基本想法,我发现了一种更简单且看似更稳定的方法,不需要修改 vanilla StyleGAN 架构或过渡期。简单地说,它是在训练一个无条件模型,然后使用一次性权重转移来立即将无条件模型“转换”为有条件模型,然后继续训练它。脚步:

  1. 创建条件数据集
  2. 使用与条件模型相同的配置开始训练无条件模型,但只需省略 --cond=1 标志
  3. 训练直到模型从无条件机制中“学到足够的东西”。我没有进行广泛的测试,所以我不知道训练中的“最佳”点是什么,但是一旦 FID 停止改进,一种粗略而有效的方法就是停止。它确实具有避免新超参数的好处,并且无论域/数据集如何都应该是有效的
  4. 使用步骤 2 中使用的相同命令开始训练随机​​初始化的条件模型,但这次使用 --cond=1。生成初始模型 .pkl 文件并将其写入磁盘后立即停止训练
  5. 加载两个泡菜(经过训练的无条件和未经训练的条件),将具有匹配名称和形状的所有图层权重从无条件模型复制到条件模型,保存生成的“转换后”泡菜
  6. 使用--cond=1 和--resume 恢复条件训练,从上一步指向您转换的条件泡菜

由于转换是瞬时的,并且条件从模型开始,该模型可以访问在无条件训练期间学习到的 100% 的高级数据,因此它避免了似乎在逐渐转换期间发生的数据丢失。我还不能说它完全解决了模式崩溃问题,因为我仍在训练模型的过程中,它可能在收敛之前仍会崩溃到一定程度,但我可以肯定地说它明显更加稳定. 在过渡后 3000 kImg 的任何类别中都没有模式崩溃的迹象,而在同一数据集上,在过渡结束后仅约 500 kImg 后,逐渐过渡方法在所有类别中都经历了几乎全部的模式崩溃。

希望信息对某人有所帮助。这是处理条件 StyleGAN 训练期间的稳定性问题以及一般复杂域问题的极少数实现/论文之一,但它对我没有帮助,所以我分享了一些有用的东西。 与您有些相关的可能还有“StyleGAN-XL:将 StyleGAN 缩放到大型多样化数据集”(https://arxiv.org/abs/2202.00273)。他们重新引入渐进式增长,并禁用样式混合和路径长度正则化等功能,可能有助于训练复杂的条件模型。然而,他们的模型比普通 StyleGAN 复杂得多,依赖于外部模型,并且仍然存在问题。

我目前正在尝试通过权重转移方法逐步发展 vanilla StyleGAN 架构的想法,在模型缩放的不同阶段切换到条件机制。所以猜想我们会及时看到,这可能是一种在复杂条件域上进行训练的有效方法,同时避免模式崩溃和架构更改。

Thank you for sharing. I have the same problem as you, I will try your method on my own dataset!

49xxy commented 2 years ago

In my experience with this approach, it doesn't help prevent mode collapse (at least in certain scenarios like mine), it simply delays it and prolongs the training process with the gradual transition from unconditional to conditional regimes.

From my testing, it does seem to keep training more stable and avoid mode collapse before and during the transition period. Soon after the transition is complete however, outputs start deteriorating, and the model still ends up in a total or near-total mode collapse. I suspect the nature of the problem is something conceptually similar to the vanishing gradient problem. As the transition moves closer to the conditional regime, it is by definition farther from the global/unconditional data, and benefits less from it as training goes on until the transition is complete and any benefit stops entirely.

Building on your basic idea of training unconditionally at first before introducing labels, I found a way simpler and seemingly more stable approach that doesn't require modifying the vanilla StyleGAN architecture(s), or a transition period. Simply, it's training an unconditional model, then using one-shot weight transfer to instantly "convert" the unconditional model to a conditional one, and resume training that. Steps:

  1. Create conditional dataset
  2. Start training unconditional model with the same configuration you would the conditional one, but simply omitting the --cond=1 flag
  3. Train until the model has "learned enough" from the unconditional regime. I haven't done extensive testing so I don't know what the "optimal" point in training this would be, but a crude yet effective method would be to stop once FID stops improving. It does have the benefit of avoiding new hyperparameters, and should be effective regardless of domain/dataset
  4. Start training a randomly initialized conditional model with the same command used in step 2, but with --cond=1 this time. Stop training as soon as the initial model .pkl file is generated and written to disk
  5. Load both pickles (trained unconditional and untrained conditional), copy all layer weights with matching names and shapes from unconditional to conditional model, save the resulting "converted" pickle
  6. Resume conditional training with --cond=1 and --resume pointing to your converted conditional pickle from the pervious step

Since the transition is instantaneous and conditioning starts with the model having access to 100% of the high-level data learned during unconditional training, it avoids the data loss that seems to occur during a gradual transition. I can't say that it entirely fixes the mode collapse problem yet, since I'm still in the middle of training the model and it may still collapse to one degree or another before convergence, but I can definitely say it's been significantly more stable. There's no sign of mode collapse in any of the classes 3000 kImg after the transition, whereas the gradual transition approach experienced near-total mode collapse across all classes after only about 500 kImg after the transition ended, on the same dataset.

Hoping the info helps someone. This was one of the very few implementations/papers that deals with stability problems during conditional StyleGAN training, and problems with complex domains in general, but it didn't help me, so I'm sharing something that did. Also somewhat relevant to you might be "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" (https://arxiv.org/abs/2202.00273). Their re-introduction of progressive growing, and disabling things like style mixing and path length regularization, might be useful in training complex conditional models. However, their model is way more complex that vanilla StyleGAN, depends on external models, and still has issues.

I'm currently playing around with the idea of progressively growing vanilla StyleGAN architectures via the weight-transfer method, switching to a conditional regimes at different stages of model scaling. So guess we'll see in time, might be an effective method of training on complex conditional domains, while avoiding mode collapse and architecture changes.

Hi! How to achieve weight transfer in step 5? I sincerely hope to get your help!

Kaoru8 commented 2 years ago

@49xxy Sorry for the extremely late response, I'm not that active on GitHub in the first place and life has been extra chaotic for the past few months... Which I guess is why I managed to somehow completely forget that weight transfer isn't supported in vanilla StyleGAN when writing my original post. Here's the code:

import pickle

def percent(current, total):
    if current == 0 or total == 0: return 0
    else:
        perc = current/(total/100.0)
        return float("%.2f"%perc)

class WeightTransfer():
    def __init__(self, srcPath, destPath, useSubnets=["G","G_ema","D"], transferMapping=True):
        self.srcPath, self.destPath = srcPath, destPath
        self.useSubnets, self.transferMapping = useSubnets, transferMapping
        self.allSubnets = ["D","G","G_ema"]
        self.pickles = None

    def transfer(self, outPath):
        if self.pickles == None: self.loadPickles()
        mapping = self.findMapping(self.pickles)
        count, mappedCount, saveData = 0, 0, {}
        for subNet in mapping:
            srcState, destState = self.pickles["src"][subNet].state_dict(), self.pickles["dest"][subNet].state_dict()
            for param in mapping[subNet]:
                m = mapping[subNet][param]
                count+=1
                if m != None:
                    destState[m] = srcState[param]
                    mappedCount+=1
            self.pickles["dest"][subNet].load_state_dict(destState)     
        print("Transferred",mappedCount,"/",count,"parameters (",percent(mappedCount,count),"%)")
        for subNet in self.allSubnets: saveData[subNet] = self.pickles["dest"][subNet]
        with open(outPath, 'wb') as f: pickle.dump(saveData, f)

    def findMapping(self, pickles):
        result = {}
        for subNet in self.useSubnets:
            srcParams, destParams = pickles["src"][subNet+"_params"], pickles["dest"][subNet+"_params"]
            srcState, destState = pickles["src"][subNet].state_dict(), pickles["dest"][subNet].state_dict()
            result[subNet] = {}
            for paramName in srcParams:
                if paramName in destParams and srcState[paramName].shape == destState[paramName].shape: result[subNet][paramName] = paramName
                else: result[subNet][paramName] = None
        return result

    def loadPickles(self):
        if self.pickles == None:
            self.pickles = {}
            for key, pklFile in [("src",self.srcPath),("dest",self.destPath)]:
                self.pickles[key] = {"picklePath":pklFile}
                with open(pklFile,"rb") as f: 
                    pkl = pickle.load(f)
                    self.pickles[key]["res"] = pkl["G"].__dict__["img_resolution"]
                    for subNet in self.allSubnets:
                        self.pickles[key][subNet] = pkl[subNet]
                        self.pickles[key][subNet+"_params"] = list(name for name,weight in pkl[subNet].named_parameters())

wt = WeightTransfer("unCond-lastSnapshot.pkl", "cond-randomInitSnapshot.pkl")
wt.transfer("cond-weightTransferResume.pkl")

Unnecessarily verbose and wrapped in a class because I plan on eventually expanding it for more complex weight transfer scenarios (upscaling/downscaling, weight blending, etc.), but that's all you need for this scenario, and you can streamline the code yourself.

Listen-lei commented 6 months ago

@Kaoru8 hello, I have just started training gan networks, and I would like to ask two questions:

I want to restrict the output of my gan network by some conditions, should I use cgan or code from this repository? Can stylegan-ada accomplish my purpose?

How do I make a dataset for a cgan or a conditional gan, by which I mean how do I associate and train the tags and images that I want to add? An example if possible, or a link to a study?

Best wishes. l1sten