open-mmlab / mmagic

OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image generation, image/video restoration/enhancement, etc.
https://mmagic.readthedocs.io/en/latest/
Apache License 2.0
6.82k stars 1.05k forks source link

[Bug] MMagic Inpainting not working with np.ndarray input type for `img` and `mask` #1945

Open ayushjain-ow opened 1 year ago

ayushjain-ow commented 1 year ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmagic

Environment

sys.platform: linux Python: 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0] CUDA available: False numpy_random_seed: 2022 GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 2.0.1+cu118 PyTorch compiling details: PyTorch built with:

TorchVision: 0.15.2+cu118 OpenCV: 4.7.0 MMEngine: 0.8.2 MMCV: 2.0.1 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.8 MMagic: 1.0.1+

Reproduces the problem - code sample

Sample Colab Notebook - MMagic-Issue.ipynb

  1. Relevant Imports
from io import BytesIO

import requests
import numpy as np
from PIL import Image
from matplotlib import pyplot as plt
  1. Loaded global_local model for image inpainting. Also tried with partial_conv Reference
from mmagic.apis import MMagicInferencer
inp_inferencer = MMagicInferencer(model_name='global_local', model_setting=1)
# inp_inferencer = MMagicInferencer(model_name='partial_conv', model_setting=1)
  1. Load an image and create a sample mask.
def read_img(img_url):
    response = requests.get(img_url)
    img = Image.open(BytesIO(response.content))
    img = np.array(img)
    return img

img = read_img("https://as2.ftcdn.net/v2/jpg/02/73/70/37/1000_F_273703719_IZpqbdeCtPkOSCKSznjoqUWkKWTp9WEI.jpg")
mask = np.zeros_like(img[:,:,0])
mask[425:650, 10:35] = 255
mask[380:440, 500:540] = 255
  1. Run inference for the inpainting model Method Source - MMagicInferencer.infer As per the method definition and this, np.ndarray should have been supported.
res = inp_inferencer_pc.infer(
    img=img[:,:,::-1], # RGB -> BGR
    mask=mask
)

I guess this is due to this piece of code mmagic.apis.inferencers.inpainting_inferencer.InpaintingInferencer.preprocess.

Reproduces the problem - command or script

NIL

Reproduces the problem - error message

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-11-5df187c0e479> in <cell line: 1>()
----> 1 res = inp_inferencer_pc.infer(
      2     img=img[:,:,::-1], # RGB -> BGR
      3     mask=mask
      4 )

9 frames
/usr/local/lib/python3.10/dist-packages/mmagic/apis/mmagic_inferencer.py in infer(self, img, video, label, trimap, mask, result_out_dir, **kwargs)
    212             each image or video.
    213         """
--> 214         return self.inferencer(
    215             img=img,
    216             video=video,

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/__init__.py in __call__(self, **kwargs)
    104             Union[Dict, List[Dict]]: Results of inference pipeline.
    105         """
--> 106         return self.inferencer(**kwargs)
    107 
    108     def get_extra_parameters(self) -> List[str]:

/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
    113     def decorate_context(*args, **kwargs):
    114         with ctx_factory():
--> 115             return func(*args, **kwargs)
    116 
    117     return decorate_context

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py in __call__(self, **kwargs)
    139         postprocess_kwargs.update(params[3])
    140 
--> 141         data = self.preprocess(**preprocess_kwargs)
    142         preds = self.forward(data, **forward_kwargs)
    143         imgs = self.visualize(preds, **visualize_kwargs)

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/inpainting_inferencer.py in preprocess(self, img, mask)
     49 
     50         # prepare data
---> 51         _data = infer_pipeline(dict(gt_path=img, mask_path=mask))
     52         data = dict()
     53         data['inputs'] = [_data['inputs']]

/usr/local/lib/python3.10/dist-packages/mmengine/dataset/base_dataset.py in __call__(self, data)
     56         """
     57         for t in self.transforms:
---> 58             data = t(data)
     59             # The transform will return None when it failed to load images or
     60             # cannot find suitable augmentation parameters to augment the data.

/usr/local/lib/python3.10/dist-packages/mmcv/transforms/base.py in __call__(self, results)
     10                  results: Dict) -> Optional[Union[Dict, Tuple[List, List]]]:
     11 
---> 12         return self.transform(results)
     13 
     14     @abstractmethod

/usr/local/lib/python3.10/dist-packages/mmagic/datasets/transforms/loading.py in transform(self, results)
    105 
    106         for filename in filenames:
--> 107             img = self._load_image(filename)
    108             img = self._convert(img)
    109             images.append(img)

/usr/local/lib/python3.10/dist-packages/mmagic/datasets/transforms/loading.py in _load_image(self, filename)
    146             img_bytes = self.cache[filename]
    147         else:
--> 148             img_bytes = self.file_backend.get(filename)
    149             if self.use_cache:
    150                 self.cache[filename] = img_bytes

/usr/local/lib/python3.10/dist-packages/mmengine/fileio/backends/local_backend.py in get(self, filepath)
     31             b'hello world'
     32         """
---> 33         with open(filepath, 'rb') as f:
     34             value = f.read()
     35         return value

OSError: [Errno 36] File name too long: '[[[ 20  69  55]\n  [ 20  69  55]\n  [ 23  72  58]\n  ...\n  [ 12  12  12]\n  [ 12  12  12]\n  [ 13  13  13]]\n\n [[ 12  44  27]\n  [ 15  62  46]\n  [ 25  79  66]\n  ...\n  [ 14  14  14]\n  [ 13  13  13]\n  [ 12  12  12]]\n\n [[ 16  81  65]\n  [ 33 107  89]\n  [ 15  79  60]\n  ...\n  [ 15  15  15]\n  [ 13  13  13]\n  [ 11  11  11]]\n\n ...\n\n [[  8  27  10]\n  [ 10  18  11]\n  [ 13  13  13]\n  ...\n  [ 10  75  50]\n  [  0  67  36]\n  [  7  81  39]]\n\n [[ 11  22  12]\n  [ 11  11  11]\n  [ 11   9   9]\n  ...\n  [  1  54  41]\n  [  6  54  28]\n  [ 11  67  26]]\n\n [[ 10  18  11]\n  [ 15  10  12]\n  [ 13  14  12]\n  ...\n  [  2  40  40]\n  [ 12  58  36]\n  [  6  70  28]]]'

Additional information

I also tried saving the img and mask in local filesystem and pass the filepaths to .infer function, still encountered an error.

Code to reproduce,

img_f = "/tmp/img.png"
Image.fromarray(img).save(img_f)

mask_f = "/tmp/mask.png"
Image.fromarray(mask).save(mask_f)

res = inp_inferencer.infer(
    img=img_f, # RGB -> BGR
    mask=mask_f
)

Exception traceback

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-16-9b32f8498177> in <cell line: 1>()
----> 1 res = inp_inferencer.infer(
      2     img=img_f,
      3     mask=mask_f
      4 )

8 frames
/usr/local/lib/python3.10/dist-packages/mmagic/apis/mmagic_inferencer.py in infer(self, img, video, label, trimap, mask, result_out_dir, **kwargs)
    212             each image or video.
    213         """
--> 214         return self.inferencer(
    215             img=img,
    216             video=video,

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/__init__.py in __call__(self, **kwargs)
    104             Union[Dict, List[Dict]]: Results of inference pipeline.
    105         """
--> 106         return self.inferencer(**kwargs)
    107 
    108     def get_extra_parameters(self) -> List[str]:

/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
    113     def decorate_context(*args, **kwargs):
    114         with ctx_factory():
--> 115             return func(*args, **kwargs)
    116 
    117     return decorate_context

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py in __call__(self, **kwargs)
    140 
    141         data = self.preprocess(**preprocess_kwargs)
--> 142         preds = self.forward(data, **forward_kwargs)
    143         imgs = self.visualize(preds, **visualize_kwargs)
    144         results = self.postprocess(preds, imgs, **postprocess_kwargs)

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/inpainting_inferencer.py in forward(self, inputs)
     59         inputs = self.model.data_preprocessor(inputs)
     60         with torch.no_grad():
---> 61             result = self.model(mode='predict', **inputs)
     62         return result
     63 

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
   1499                 or _global_backward_pre_hooks or _global_backward_hooks
   1500                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501             return forward_call(*args, **kwargs)
   1502         # Do not call functions when jit is used
   1503         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.10/dist-packages/mmagic/models/base_models/one_stage.py in forward(self, inputs, data_samples, mode)
    143         elif mode == 'predict':
    144             # Pre-process runs in BaseModel.val_step / test_step
--> 145             predictions = self.forward_test(inputs, data_samples)
    146             predictions = self.convert_to_datasample(predictions, data_samples,
    147                                                      inputs)

/usr/local/lib/python3.10/dist-packages/mmagic/models/base_models/one_stage.py in forward_test(self, inputs, data_samples)
    395                 DataSample.
    396         """
--> 397         fake_reses, fake_imgs = self.forward_tensor(inputs, data_samples)
    398 
    399         predictions = []

/usr/local/lib/python3.10/dist-packages/mmagic/models/base_models/one_stage.py in forward_tensor(self, inputs, data_samples)
    380         input_xs = torch.cat([masked_imgs, masks], dim=1)  # N,4,H,W
    381         fake_reses = self.generator(input_xs)
--> 382         fake_imgs = fake_reses * masks + masked_imgs * (1. - masks)
    383         return fake_reses, fake_imgs
    384 

RuntimeError: The size of tensor a (664) must match the size of tensor b (662) at non-singleton dimension 2
zengyh1900 commented 1 year ago

hi @ayushjain-ow ,

I think you're right! Would you like to make a pull request to fix this issue? I think we can solve it in two ways.

First, we can remove LoadImageFromFile and LoadMask from the infer_pipeline_cfg if the type of the img and mask are np.array. In a word, we can check their type to set the infer_pipeline_cfg

Second, when you input by image filename, you need to make sure the input image size can be divided by 4. This is because the global_local will downsample the input image by strided convolution and then upsample it by 4. Your example image has shape (662, 1000, 3), therefore the output size will be ceil(662/4)*4=664. I think we need to add an assertion before inference to remind users this issue.

Looking forward to your pull request!