Closed shoutOutYangJie closed 6 months ago
The loss can vary depending on how you train. For instance, if you chose to also learn the timestep-embedding by setting the parameter learn_embedding: true
, your loss will still start high, regardless of the zero convolutions. In this case, the model might take a few thousand steps to start producing proper results again.
cheers
The loss can vary depending on how you train. For instance, if you chose to also learn the timestep-embedding by setting the parameter
learn_embedding: true
, your loss will still start high, regardless of the zero convolutions. In this case, the model might take a few thousand steps to start producing proper results again.cheers
could you give some log for reference?
@Sipirius I have the last question. If control model branch is trained from scratch, using random weight to initialize, why controlnet-xs need "zero conv" block for feature fusion from base branch to control branch. It is not neccssary to use it because it train from scratch. Have you do some experience to test model if no "zero conv" for control branch?
It is beneficial to use zero convolutions to not negatively impact the genration capabilities of the network from the start. This way, the controlling model can focus right away on enhancing the generation rather than to not destroy it at first.
I have trained original controlnet, the loss is lower than 0.15 at the begin of training, due to zero conv module. as for controlnet-xs, the model still uses "zero conv" module, but the initial loss is about 0.20. Is it normal? can you introduce your training loss. can you show me, please!