lllyasviel / ControlNet

Let us control diffusion models!
Apache License 2.0
28.97k stars 2.62k forks source link

Clarification of some details from article #582

Open GLivshits opened 7 months ago

GLivshits commented 7 months ago

Hello. I am trying to train and infer a model according to your article. However, some things are not clear from article: 1) "In the training process, we randomly replace 50% text prompts ct with empty strings". Ok, you replace 50 % of the input text to ControlNet with empty strings, but do you so also for base model (for same elements in the batch)? 2) In section "Classifier-free guidance resolution weighting" you state that you are reweighting ControlNet residuals by wi = (64 / {resolution of residual}). Am I right, that lowest resolution residual (middle block) gets a multiplier of 8? In my case it completely breaks the generation (diffusers case of using np.linspace(-1, 0, 13) works slightly better).