huggingface / controlnet_aux

Apache License 2.0
398 stars 86 forks source link

Consistent resizing parameters and behavior #49

Closed pdoane closed 1 year ago

pdoane commented 1 year ago
patrickvonplaten commented 1 year ago

Hey @pdoane,

Thanks for the PR.

Breaking change: Detectors that did not have return_pil parameters will now by default return PIL images when passed numpy arrays

=> do we really have to do this? I'm not too keen on breaking backwards comp

pdoane commented 1 year ago

=> do we really have to do this? I'm not too keen on breaking backwards comp

Two other options I can think of:

  1. Don't add the parameter - the detectors do not have a uniform interface making abstractions like the Processor class harder. Processor can have metadata to make the interface uniform.
  2. Add a new parameter output_type and support np, pil, and None. The None value would preserve existing behavior.
pdoane commented 1 year ago

The output_type parameter would work like this.

For detectors without return_pil that inferred type:

def __call__(self, input_image, output_type=None):
    if not isinstance(input_image, np.ndarray):
        input_image = np.array(input_image, dtype=np.uint8)
        output_type = output_type or "pil"
    else:
        output_type = output_type or "np"

For detectors with return_pil:

def __call__(self, input_image, return_pil=None, output_type="pil"):
    if return_pil is not None:
        warnings.warn("return_pil is deprecated. Use output_type instead.", DeprecationWarning)
        output_type = "pil" if return_pil else "np"

For detectors that always returned PIL:

def __call__(self, input_image, output_type="pil"):
pdoane commented 1 year ago

PR is updated with a new output_type parameter following the strategy outlined above. @patrickvonplaten - let me know if you want an alternate approach.

pdoane commented 1 year ago

@ChristiaensBert - if we merge this approach it's worth revisiting the Processor return_pil parameter (which is another variant with the .jpeg output).

pdoane commented 1 year ago

@patrickvonplaten - we put tests in different locations (tests vs src/controlnet_aux/tests). Which way do you want?

ChristiaensBert commented 1 year ago

@ChristiaensBert - if we merge this approach it's worth revisiting the Processor return_pil parameter (which is another variant with the .jpeg output).

Yes indeed! So you want to have an output_type that supports different types, such as PIL.Image.Image, np.ndarray, bytes, ..?

And nice that the resize isn't needed anymore!

ChristiaensBert commented 1 year ago

For one of my use cases, it's also useful to return the image as bytes

output_bytes = io.BytesIO()
processed_image.save(output_bytes, format='JPEG')
return output_bytes.getvalue()

Do you think this is valuable to support too?

Edit: useful formats could be

Union[bytes, Image.Image, np.ndarray, torch.Tensor]
ChristiaensBert commented 1 year ago

@pdoane I was thinking, maybe instead of implementing the same resizing and converting image logic (with the output_type etc), we could move this logic into a general helper method that takes in the image and output_type? If we make changes, e.g. to support new output types etc, we only have to change it once.

Like

def format_image(image: Union[bytes, Image.Image, np.ndarray, torch.Tensor], output_type: str, return_pil: Optional[bool] = None) -> Union[bytes, Image.Image, np.ndarray, torch.Tensor]:
    if return_pil:
        # give a warning etc

    # check which image type and convert it
    return image

Then this

    def __call__(self, input_image, detect_resolution=512, image_resolution=512, include_body=True, include_hand=False, include_face=False, hand_and_face=None, return_pil=None, output_type="pil"):
       # logic
        if return_pil is not None:
            warnings.warn("return_pil is deprecated. Use output_type instead.", DeprecationWarning)
            output_type = "pil" if return_pil else "np"

        if not isinstance(input_image, np.ndarray):
            input_image = np.array(input_image, dtype=np.uint8)

        # machine learning magic

        if output_type == "pil":
            detected_map = Image.fromarray(detected_map)

        return detected_map

can be refactored to

    def __call__(self, input_image, detect_resolution=512, image_resolution=512, include_body=True, include_hand=False, include_face=False, hand_and_face=None, return_pil=None, output_type="pil"):
        # logic
        # machine learning magic

        detected_map = format_image(detected_map, output_type, return_pil)

        return detected_map

and the format_image can be reused for different scenarios (such as converting the input image into a compatible type for the ML model)

pdoane commented 1 year ago

I wanted to resolve the implementation differences in place first to spot any issues. I believe most of those were bugs and oversights, but several differences remain coming from @patrickvonplaten's request to avoid backward compatibility breaks.

With the package versioned as 0.0.x, my expectation is that major changes to the API are still expected, but that isn't my call. Personally, I would move all of this logic from the Detector classes to the Processor class.

pdoane commented 1 year ago

The parameter rename to input_image is now detected as well as raising a ValueError if input_image is None. I am not following how removing return_pil=None helps for backwards compatibility?

pdoane commented 1 year ago

@patrickvonplaten - changes for return_pil have been made