pipinstallyp / minigpt4-batch

Use miniGPT-4 batch to generate captions for a lot of images! You should be able to create the best captions you always wanted!
BSD 3-Clause "New" or "Revised" License
17 stars 1 forks source link

RuntimeError: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224] #2

Closed NicholasHarvey91 closed 1 year ago

NicholasHarvey91 commented 1 year ago

I've encountered this a couple of times now and always have to remove the offending image/images, then try again. I guess it's a color space issue? Is it possible to address in the code or does some image prep need to happen beforehand? Just ripping subreddits here. I'm also running it in a Colab which seemingly works fine, don't think that's the problem.

╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/backup_app.py:278 in │ │ <module> │ │ │ │ 275 │ keras.backend.clear_session() │ │ 276 │ │ 277 if __name__ == '__main__': │ │ ❱ 278 │ main() │ │ 279 │ │ │ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/backup_app.py:250 in │ │ main │ │ │ │ 247 │ │ │ 248 │ for image_path in image_paths: │ │ 249 │ │ start_time = time.time() │ │ ❱ 250 │ │ caption = describe_image(image_path, chat, chat_state, img_lis │ │ 251 │ │ │ │ 252 │ │ with open("mycaptions/{}_caption.txt".format(os.path.splitext( │ │ 253 │ │ │ f.write(caption) │ │ │ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/backup_app.py:77 in │ │ describe_image │ │ │ │ 74 │ img_list = [] │ │ 75 │ │ │ 76 │ gr_img = Image.open(image_path) │ │ ❱ 77 │ llm_message = chat.upload_img(gr_img, chat_state, img_list) │ │ 78 │ │ │ 79 │ chat.ask("Describe this image.", chat_state) │ │ 80 │ generated_caption = chat.answer(conv=chat_state, img_list=img_list │ │ │ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/minigpt4/conversation/ │ │ conversation.py:171 in upload_img │ │ │ │ 168 │ │ │ image = self.vis_processor(raw_image).unsqueeze(0).to(self │ │ 169 │ │ elif isinstance(image, Image.Image): │ │ 170 │ │ │ raw_image = image │ │ ❱ 171 │ │ │ image = self.vis_processor(raw_image).unsqueeze(0).to(self │ │ 172 │ │ elif isinstance(image, torch.Tensor): │ │ 173 │ │ │ if len(image.shape) == 3: │ │ 174 │ │ │ │ image = image.unsqueeze(0) │ │ │ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/minigpt4/processors/bl │ │ ip_processors.py:129 in __call__ │ │ │ │ 126 │ │ ) │ │ 127 │ │ │ 128 │ def __call__(self, item): │ │ ❱ 129 │ │ return self.transform(item) │ │ 130 │ │ │ 131 │ @classmethod │ │ 132 │ def from_config(cls, cfg=None): │ │ │ │ /usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py │ │ :95 in __call__ │ │ │ │ 92 │ │ │ 93 │ def __call__(self, img): │ │ 94 │ │ for t in self.transforms: │ │ ❱ 95 │ │ │ img = t(img) │ │ 96 │ │ return img │ │ 97 │ │ │ 98 │ def __repr__(self) -> str: │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py │ │ :277 in forward │ │ │ │ 274 │ │ Returns: │ │ 275 │ │ │ Tensor: Normalized Tensor image. │ │ 276 │ │ """ │ │ ❱ 277 │ │ return F.normalize(tensor, self.mean, self.std, self.inplace) │ │ 278 │ │ │ 279 │ def __repr__(self) -> str: │ │ 280 │ │ return f"{self.__class__.__name__}(mean={self.mean}, std={sel │ │ │ │ /usr/local/lib/python3.10/dist-packages/torchvision/transforms/functional.py │ │ :363 in normalize │ │ │ │ 360 │ if not isinstance(tensor, torch.Tensor): │ │ 361 │ │ raise TypeError(f"img should be Tensor Image. Got {type(tenso │ │ 362 │ │ │ ❱ 363 │ return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace) │ │ 364 │ │ 365 │ │ 366 def _compute_resized_output_size( │ │ │ │ /usr/local/lib/python3.10/dist-packages/torchvision/transforms/_functional_t │ │ ensor.py:928 in normalize │ │ │ │ 925 │ │ mean = mean.view(-1, 1, 1) │ │ 926 │ if std.ndim == 1: │ │ 927 │ │ std = std.view(-1, 1, 1) │ │ ❱ 928 │ return tensor.sub_(mean).div_(std) │ │ 929 │ │ 930 │ │ 931 def erase(img: Tensor, i: int, j: int, h: int, w: int, v: Tensor, inpl │ ╰──────────────────────────────────────────────────────────────────────────────╯ RuntimeError: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]

NicholasHarvey91 commented 1 year ago

Specifically, I think it's choking on greyscale images. I've also seen some captions mentioning the image as being in black and white when it wasn't. Maybe related?

pipinstallyp commented 1 year ago

I believe, this is breaking due to the developments we are doing right now. For now we are going with backup_app.py, instead of app.py. I'll update the instructions soon.

Please try:

  1. Running setup.bat
  2. Navigating into the cmd of the same directory
  3. venv\Scripts\activate.bat
  4. Running the following command:

python backup_app.py --image-folder your_image_directory --beam-search-numbers 2 --model-dir models/wd14_tagger --undesired-tags '1girl,1boy,solo'

For now we've integrated minigpt4 and WD. Working to make that optional in few hours!

pipinstallyp commented 1 year ago

Please let me know, if there are any issues or if it works. Happy to help out.