Closed pdoane closed 1 year ago
Hey @pdoane,
Thanks for the PR.
Breaking change: Detectors that did not have return_pil parameters will now by default return PIL images when passed numpy arrays
=> do we really have to do this? I'm not too keen on breaking backwards comp
=> do we really have to do this? I'm not too keen on breaking backwards comp
Two other options I can think of:
output_type
and support np
, pil
, and None
. The None
value would preserve existing behavior.The output_type parameter would work like this.
For detectors without return_pil that inferred type:
def __call__(self, input_image, output_type=None):
if not isinstance(input_image, np.ndarray):
input_image = np.array(input_image, dtype=np.uint8)
output_type = output_type or "pil"
else:
output_type = output_type or "np"
For detectors with return_pil:
def __call__(self, input_image, return_pil=None, output_type="pil"):
if return_pil is not None:
warnings.warn("return_pil is deprecated. Use output_type instead.", DeprecationWarning)
output_type = "pil" if return_pil else "np"
For detectors that always returned PIL:
def __call__(self, input_image, output_type="pil"):
PR is updated with a new output_type parameter following the strategy outlined above. @patrickvonplaten - let me know if you want an alternate approach.
@ChristiaensBert - if we merge this approach it's worth revisiting the Processor return_pil parameter (which is another variant with the .jpeg output).
@patrickvonplaten - we put tests in different locations (tests vs src/controlnet_aux/tests). Which way do you want?
@ChristiaensBert - if we merge this approach it's worth revisiting the Processor return_pil parameter (which is another variant with the .jpeg output).
Yes indeed! So you want to have an output_type
that supports different types, such as PIL.Image.Image, np.ndarray, bytes, ..?
And nice that the resize isn't needed anymore!
For one of my use cases, it's also useful to return the image as bytes
output_bytes = io.BytesIO()
processed_image.save(output_bytes, format='JPEG')
return output_bytes.getvalue()
Do you think this is valuable to support too?
Edit: useful formats could be
Union[bytes, Image.Image, np.ndarray, torch.Tensor]
@pdoane
I was thinking, maybe instead of implementing the same resizing and converting image logic (with the output_type etc), we could move this logic into a general helper method that takes in the image
and output_type
? If we make changes, e.g. to support new output types etc, we only have to change it once.
Like
def format_image(image: Union[bytes, Image.Image, np.ndarray, torch.Tensor], output_type: str, return_pil: Optional[bool] = None) -> Union[bytes, Image.Image, np.ndarray, torch.Tensor]:
if return_pil:
# give a warning etc
# check which image type and convert it
return image
Then this
def __call__(self, input_image, detect_resolution=512, image_resolution=512, include_body=True, include_hand=False, include_face=False, hand_and_face=None, return_pil=None, output_type="pil"):
# logic
if return_pil is not None:
warnings.warn("return_pil is deprecated. Use output_type instead.", DeprecationWarning)
output_type = "pil" if return_pil else "np"
if not isinstance(input_image, np.ndarray):
input_image = np.array(input_image, dtype=np.uint8)
# machine learning magic
if output_type == "pil":
detected_map = Image.fromarray(detected_map)
return detected_map
can be refactored to
def __call__(self, input_image, detect_resolution=512, image_resolution=512, include_body=True, include_hand=False, include_face=False, hand_and_face=None, return_pil=None, output_type="pil"):
# logic
# machine learning magic
detected_map = format_image(detected_map, output_type, return_pil)
return detected_map
and the format_image
can be reused for different scenarios (such as converting the input image into a compatible type for the ML model)
I wanted to resolve the implementation differences in place first to spot any issues. I believe most of those were bugs and oversights, but several differences remain coming from @patrickvonplaten's request to avoid backward compatibility breaks.
With the package versioned as 0.0.x, my expectation is that major changes to the API are still expected, but that isn't my call. Personally, I would move all of this logic from the Detector classes to the Processor class.
The parameter rename to input_image
is now detected as well as raising a ValueError if input_image
is None. I am not following how removing return_pil=None
helps for backwards compatibility?
@patrickvonplaten - changes for return_pil have been made