Hello. I am trying to train and infer a model according to your article. However, some things are not clear from article:
1) "In the training process, we randomly replace 50% text prompts ct with empty strings". Ok, you replace 50 % of the input text to ControlNet with empty strings, but do you so also for base model (for same elements in the batch)?
2) In section "Classifier-free guidance resolution weighting" you state that you are reweighting ControlNet residuals by wi = (64 / {resolution of residual}). Am I right, that lowest resolution residual (middle block) gets a multiplier of 8? In my case it completely breaks the generation (diffusers case of using np.linspace(-1, 0, 13) works slightly better).
Hello. I am trying to train and infer a model according to your article. However, some things are not clear from article: 1) "In the training process, we randomly replace 50% text prompts ct with empty strings". Ok, you replace 50 % of the input text to ControlNet with empty strings, but do you so also for base model (for same elements in the batch)? 2) In section "Classifier-free guidance resolution weighting" you state that you are reweighting ControlNet residuals by wi = (64 / {resolution of residual}). Am I right, that lowest resolution residual (middle block) gets a multiplier of 8? In my case it completely breaks the generation (diffusers case of using np.linspace(-1, 0, 13) works slightly better).