Closed NicholasHarvey91 closed 1 year ago
Specifically, I think it's choking on greyscale images. I've also seen some captions mentioning the image as being in black and white when it wasn't. Maybe related?
I believe, this is breaking due to the developments we are doing right now. For now we are going with backup_app.py, instead of app.py. I'll update the instructions soon.
Please try:
python backup_app.py --image-folder your_image_directory --beam-search-numbers 2 --model-dir models/wd14_tagger --undesired-tags '1girl,1boy,solo'
For now we've integrated minigpt4 and WD. Working to make that optional in few hours!
Please let me know, if there are any issues or if it works. Happy to help out.
I've encountered this a couple of times now and always have to remove the offending image/images, then try again. I guess it's a color space issue? Is it possible to address in the code or does some image prep need to happen beforehand? Just ripping subreddits here. I'm also running it in a Colab which seemingly works fine, don't think that's the problem.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/backup_app.py:278 in │ │ <module> │ │ │ │ 275 │ keras.backend.clear_session() │ │ 276 │ │ 277 if __name__ == '__main__': │ │ ❱ 278 │ main() │ │ 279 │ │ │ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/backup_app.py:250 in │ │ main │ │ │ │ 247 │ │ │ 248 │ for image_path in image_paths: │ │ 249 │ │ start_time = time.time() │ │ ❱ 250 │ │ caption = describe_image(image_path, chat, chat_state, img_lis │ │ 251 │ │ │ │ 252 │ │ with open("mycaptions/{}_caption.txt".format(os.path.splitext( │ │ 253 │ │ │ f.write(caption) │ │ │ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/backup_app.py:77 in │ │ describe_image │ │ │ │ 74 │ img_list = [] │ │ 75 │ │ │ 76 │ gr_img = Image.open(image_path) │ │ ❱ 77 │ llm_message = chat.upload_img(gr_img, chat_state, img_list) │ │ 78 │ │ │ 79 │ chat.ask("Describe this image.", chat_state) │ │ 80 │ generated_caption = chat.answer(conv=chat_state, img_list=img_list │ │ │ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/minigpt4/conversation/ │ │ conversation.py:171 in upload_img │ │ │ │ 168 │ │ │ image = self.vis_processor(raw_image).unsqueeze(0).to(self │ │ 169 │ │ elif isinstance(image, Image.Image): │ │ 170 │ │ │ raw_image = image │ │ ❱ 171 │ │ │ image = self.vis_processor(raw_image).unsqueeze(0).to(self │ │ 172 │ │ elif isinstance(image, torch.Tensor): │ │ 173 │ │ │ if len(image.shape) == 3: │ │ 174 │ │ │ │ image = image.unsqueeze(0) │ │ │ │ /content/minigpt4-batch/minigpt4-batch/minigpt4-batch/minigpt4/processors/bl │ │ ip_processors.py:129 in __call__ │ │ │ │ 126 │ │ ) │ │ 127 │ │ │ 128 │ def __call__(self, item): │ │ ❱ 129 │ │ return self.transform(item) │ │ 130 │ │ │ 131 │ @classmethod │ │ 132 │ def from_config(cls, cfg=None): │ │ │ │ /usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py │ │ :95 in __call__ │ │ │ │ 92 │ │ │ 93 │ def __call__(self, img): │ │ 94 │ │ for t in self.transforms: │ │ ❱ 95 │ │ │ img = t(img) │ │ 96 │ │ return img │ │ 97 │ │ │ 98 │ def __repr__(self) -> str: │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py │ │ :277 in forward │ │ │ │ 274 │ │ Returns: │ │ 275 │ │ │ Tensor: Normalized Tensor image. │ │ 276 │ │ """ │ │ ❱ 277 │ │ return F.normalize(tensor, self.mean, self.std, self.inplace) │ │ 278 │ │ │ 279 │ def __repr__(self) -> str: │ │ 280 │ │ return f"{self.__class__.__name__}(mean={self.mean}, std={sel │ │ │ │ /usr/local/lib/python3.10/dist-packages/torchvision/transforms/functional.py │ │ :363 in normalize │ │ │ │ 360 │ if not isinstance(tensor, torch.Tensor): │ │ 361 │ │ raise TypeError(f"img should be Tensor Image. Got {type(tenso │ │ 362 │ │ │ ❱ 363 │ return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace) │ │ 364 │ │ 365 │ │ 366 def _compute_resized_output_size( │ │ │ │ /usr/local/lib/python3.10/dist-packages/torchvision/transforms/_functional_t │ │ ensor.py:928 in normalize │ │ │ │ 925 │ │ mean = mean.view(-1, 1, 1) │ │ 926 │ if std.ndim == 1: │ │ 927 │ │ std = std.view(-1, 1, 1) │ │ ❱ 928 │ return tensor.sub_(mean).div_(std) │ │ 929 │ │ 930 │ │ 931 def erase(img: Tensor, i: int, j: int, h: int, w: int, v: Tensor, inpl │ ╰──────────────────────────────────────────────────────────────────────────────╯ RuntimeError: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]