[Bug] MMagic Inpainting not working with np.ndarray input type for `img` and `mask`

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (main) or latest version (0.x).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmagic

Environment

sys.platform: linux Python: 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0] CUDA available: False numpy_random_seed: 2022 GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 2.0.1+cu118 PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.15.2+cu118 OpenCV: 4.7.0 MMEngine: 0.8.2 MMCV: 2.0.1 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.8 MMagic: 1.0.1+

Reproduces the problem - code sample

Sample Colab Notebook - MMagic-Issue.ipynb

Relevant Imports

from io import BytesIO

import requests
import numpy as np
from PIL import Image
from matplotlib import pyplot as plt

Loaded global_local model for image inpainting. Also tried with partial_conv Reference

from mmagic.apis import MMagicInferencer
inp_inferencer = MMagicInferencer(model_name='global_local', model_setting=1)
# inp_inferencer = MMagicInferencer(model_name='partial_conv', model_setting=1)

Load an image and create a sample mask.

def read_img(img_url):
    response = requests.get(img_url)
    img = Image.open(BytesIO(response.content))
    img = np.array(img)
    return img

img = read_img("https://as2.ftcdn.net/v2/jpg/02/73/70/37/1000_F_273703719_IZpqbdeCtPkOSCKSznjoqUWkKWTp9WEI.jpg")
mask = np.zeros_like(img[:,:,0])
mask[425:650, 10:35] = 255
mask[380:440, 500:540] = 255

Run inference for the inpainting model Method Source - MMagicInferencer.infer As per the method definition and this, np.ndarray should have been supported.

res = inp_inferencer_pc.infer(
    img=img[:,:,::-1], # RGB -> BGR
    mask=mask
)

I guess this is due to this piece of code mmagic.apis.inferencers.inpainting_inferencer.InpaintingInferencer.preprocess.

Reproduces the problem - command or script

NIL

Reproduces the problem - error message

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-11-5df187c0e479> in <cell line: 1>()
----> 1 res = inp_inferencer_pc.infer(
      2     img=img[:,:,::-1], # RGB -> BGR
      3     mask=mask
      4 )

9 frames
/usr/local/lib/python3.10/dist-packages/mmagic/apis/mmagic_inferencer.py in infer(self, img, video, label, trimap, mask, result_out_dir, **kwargs)
    212             each image or video.
    213         """
--> 214         return self.inferencer(
    215             img=img,
    216             video=video,

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/__init__.py in __call__(self, **kwargs)
    104             Union[Dict, List[Dict]]: Results of inference pipeline.
    105         """
--> 106         return self.inferencer(**kwargs)
    107 
    108     def get_extra_parameters(self) -> List[str]:

/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
    113     def decorate_context(*args, **kwargs):
    114         with ctx_factory():
--> 115             return func(*args, **kwargs)
    116 
    117     return decorate_context

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py in __call__(self, **kwargs)
    139         postprocess_kwargs.update(params[3])
    140 
--> 141         data = self.preprocess(**preprocess_kwargs)
    142         preds = self.forward(data, **forward_kwargs)
    143         imgs = self.visualize(preds, **visualize_kwargs)

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/inpainting_inferencer.py in preprocess(self, img, mask)
     49 
     50         # prepare data
---> 51         _data = infer_pipeline(dict(gt_path=img, mask_path=mask))
     52         data = dict()
     53         data['inputs'] = [_data['inputs']]

/usr/local/lib/python3.10/dist-packages/mmengine/dataset/base_dataset.py in __call__(self, data)
     56         """
     57         for t in self.transforms:
---> 58             data = t(data)
     59             # The transform will return None when it failed to load images or
     60             # cannot find suitable augmentation parameters to augment the data.

/usr/local/lib/python3.10/dist-packages/mmcv/transforms/base.py in __call__(self, results)
     10                  results: Dict) -> Optional[Union[Dict, Tuple[List, List]]]:
     11 
---> 12         return self.transform(results)
     13 
     14     @abstractmethod

/usr/local/lib/python3.10/dist-packages/mmagic/datasets/transforms/loading.py in transform(self, results)
    105 
    106         for filename in filenames:
--> 107             img = self._load_image(filename)
    108             img = self._convert(img)
    109             images.append(img)

/usr/local/lib/python3.10/dist-packages/mmagic/datasets/transforms/loading.py in _load_image(self, filename)
    146             img_bytes = self.cache[filename]
    147         else:
--> 148             img_bytes = self.file_backend.get(filename)
    149             if self.use_cache:
    150                 self.cache[filename] = img_bytes

/usr/local/lib/python3.10/dist-packages/mmengine/fileio/backends/local_backend.py in get(self, filepath)
     31             b'hello world'
     32         """
---> 33         with open(filepath, 'rb') as f:
     34             value = f.read()
     35         return value

OSError: [Errno 36] File name too long: '[[[ 20  69  55]\n  [ 20  69  55]\n  [ 23  72  58]\n  ...\n  [ 12  12  12]\n  [ 12  12  12]\n  [ 13  13  13]]\n\n [[ 12  44  27]\n  [ 15  62  46]\n  [ 25  79  66]\n  ...\n  [ 14  14  14]\n  [ 13  13  13]\n  [ 12  12  12]]\n\n [[ 16  81  65]\n  [ 33 107  89]\n  [ 15  79  60]\n  ...\n  [ 15  15  15]\n  [ 13  13  13]\n  [ 11  11  11]]\n\n ...\n\n [[  8  27  10]\n  [ 10  18  11]\n  [ 13  13  13]\n  ...\n  [ 10  75  50]\n  [  0  67  36]\n  [  7  81  39]]\n\n [[ 11  22  12]\n  [ 11  11  11]\n  [ 11   9   9]\n  ...\n  [  1  54  41]\n  [  6  54  28]\n  [ 11  67  26]]\n\n [[ 10  18  11]\n  [ 15  10  12]\n  [ 13  14  12]\n  ...\n  [  2  40  40]\n  [ 12  58  36]\n  [  6  70  28]]]'

Additional information

I also tried saving the img and mask in local filesystem and pass the filepaths to .infer function, still encountered an error.

Code to reproduce,

img_f = "/tmp/img.png"
Image.fromarray(img).save(img_f)

mask_f = "/tmp/mask.png"
Image.fromarray(mask).save(mask_f)

res = inp_inferencer.infer(
    img=img_f, # RGB -> BGR
    mask=mask_f
)

Exception traceback

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-16-9b32f8498177> in <cell line: 1>()
----> 1 res = inp_inferencer.infer(
      2     img=img_f,
      3     mask=mask_f
      4 )

8 frames
/usr/local/lib/python3.10/dist-packages/mmagic/apis/mmagic_inferencer.py in infer(self, img, video, label, trimap, mask, result_out_dir, **kwargs)
    212             each image or video.
    213         """
--> 214         return self.inferencer(
    215             img=img,
    216             video=video,

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/__init__.py in __call__(self, **kwargs)
    104             Union[Dict, List[Dict]]: Results of inference pipeline.
    105         """
--> 106         return self.inferencer(**kwargs)
    107 
    108     def get_extra_parameters(self) -> List[str]:

/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
    113     def decorate_context(*args, **kwargs):
    114         with ctx_factory():
--> 115             return func(*args, **kwargs)
    116 
    117     return decorate_context

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py in __call__(self, **kwargs)
    140 
    141         data = self.preprocess(**preprocess_kwargs)
--> 142         preds = self.forward(data, **forward_kwargs)
    143         imgs = self.visualize(preds, **visualize_kwargs)
    144         results = self.postprocess(preds, imgs, **postprocess_kwargs)

/usr/local/lib/python3.10/dist-packages/mmagic/apis/inferencers/inpainting_inferencer.py in forward(self, inputs)
     59         inputs = self.model.data_preprocessor(inputs)
     60         with torch.no_grad():
---> 61             result = self.model(mode='predict', **inputs)
     62         return result
     63 

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
   1499                 or _global_backward_pre_hooks or _global_backward_hooks
   1500                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501             return forward_call(*args, **kwargs)
   1502         # Do not call functions when jit is used
   1503         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.10/dist-packages/mmagic/models/base_models/one_stage.py in forward(self, inputs, data_samples, mode)
    143         elif mode == 'predict':
    144             # Pre-process runs in BaseModel.val_step / test_step
--> 145             predictions = self.forward_test(inputs, data_samples)
    146             predictions = self.convert_to_datasample(predictions, data_samples,
    147                                                      inputs)

/usr/local/lib/python3.10/dist-packages/mmagic/models/base_models/one_stage.py in forward_test(self, inputs, data_samples)
    395                 DataSample.
    396         """
--> 397         fake_reses, fake_imgs = self.forward_tensor(inputs, data_samples)
    398 
    399         predictions = []

/usr/local/lib/python3.10/dist-packages/mmagic/models/base_models/one_stage.py in forward_tensor(self, inputs, data_samples)
    380         input_xs = torch.cat([masked_imgs, masks], dim=1)  # N,4,H,W
    381         fake_reses = self.generator(input_xs)
--> 382         fake_imgs = fake_reses * masks + masked_imgs * (1. - masks)
    383         return fake_reses, fake_imgs
    384 

RuntimeError: The size of tensor a (664) must match the size of tensor b (662) at non-singleton dimension 2

open-mmlab / mmagic