lllyasviel / ControlNet

Let us control diffusion models!
Apache License 2.0
30.55k stars 2.75k forks source link

different setup of input_hint_block compared to paper? #698

Open liren-jin opened 3 months ago

liren-jin commented 3 months ago

Hi, i noticed that the implementation of the tiny work converting control images into feature space is different from the structure menioned in the paper: "In particular, we use a tiny network E(·) of four convolution layers with 4 × 4 kernels and 2 × 2 strides (activated by ReLU, using 16, 32, 64, 128, channels respectively". The corresponding implementation should be here right(correct me if i am wrong): https://github.com/lllyasviel/ControlNet/blob/ed85cd1e25a5ed592f7d8178495b4483de0331bf/cldm/cldm.py#L147-L163