lllyasviel / ControlNet-v1-1-nightly

Nightly release of ControlNet 1.1
4.47k stars 364 forks source link

What's the difference between 1-control and control in gradio_lineart.py and gradio_canny.py? #135

Closed deepmayuot closed 7 months ago

deepmayuot commented 7 months ago

Thanks for sharing the code!

I noticed that the the control in cldm.py is as follows:

    def get_input(self, batch, k, bs=None, *args, **kwargs):
        x, c = super().get_input(batch, self.first_stage_key, *args, **kwargs)
        control = batch[self.control_key]
        if bs is not None:
            control = control[:bs]
        control = control.to(self.device)
        control = einops.rearrange(control, 'b h w c -> b c h w')
        control = control.to(memory_format=torch.contiguous_format).float()

And we can use such code in gradio_canny.py to generate images:

        detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)

        control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
        control = torch.stack([control for _ in range(num_samples)], dim=0)
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()

However, in gradio_lineart.py, the control is as follows:

        detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)

        control = 1.0 - torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
        control = torch.stack([control for _ in range(num_samples)], dim=0)
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()

I am confused about this. Can anyone give some suggestions?

geroldmeisinger commented 7 months ago

1 - v on a floating point grayscale image is "invert" so as to get a black image with white lines or a white image with black lines. I think the assumption here is: canny images are usually generated with a canny edge detector, which outputs black images with white lines, whereas lineart images are usually scanned from real paper. it's for convienience.

deepmayuot commented 7 months ago

1 - v on a floating point grayscale image is "invert" so as to get a black image with white lines or a white image with black lines. I think the assumption here is: canny images are usually generated with a canny edge detector, which outputs black images with white lines, whereas lineart images are usually scanned from real paper. it's for convienience.

Thanks for your quick reply!