theovercomer8 / captionr

GIT/BLIP/CLIP Caption tool
MIT License
138 stars 15 forks source link

BLIP error with config #9

Closed mediocreatmybest closed 1 year ago

mediocreatmybest commented 1 year ago

Hi,

I'm finding I'm getting the following error in Linux Ubuntu Jammy. I'm getting the same error in Python3.8 and 3.10 with virtual venv.

INFO:root:PREVIEW MODE ENABLED. No caption files will be written.
  0%|                                                                                                                                                         | 0/3 [00:00<?, ?it/s]ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
  File "/home/machinelearning/tools/captionr/captionr/captionr_class.py", line 139, in process_img
    new_caption = config._blip.caption(img)
  File "/home/machinelearning/tools/captionr/captionr/blip_cap.py", line 48, in caption
    size = self.config.blip_image_eval_size
AttributeError: 'BLIP' object has no attribute 'config'

Other models seem to be working fine including Coca, git, clip.

theovercomer8 commented 1 year ago

Can you verify this resolved?

mediocreatmybest commented 1 year ago

Looks like that part is, but I'm now getting CUDA out of memory errors with BLIP. I've got a GPU with 12GB vRAM, I've has similar issues with BLIP previously with other projects and needed to do offloading to CPU with BLIP, that said CLIP model ViT-H with laion2b_s32b_b79k seems to be working fine.

Might be worth having CPU offloading and Clip chunk size options in future? otherwise tweaking the defaults to lower vRAM requirements?

theovercomer8 commented 1 year ago

I can work on that. For now maybe try dropping your beam count and blip max length

On Sun, Feb 19, 2023 at 10:05 PM mediocreatmybest @.***> wrote:

Looks like that part is, but I'm now getting CUDA out of memory errors with BLIP. I've got a GPU with 12GB vRAM, I've has similar issues with BLIP previously with other projects and needed to do offloading to CPU with BLIP, that said CLIP model ViT-H with laion2b_s32b_b79k seems to be working fine.

Might be worth having CPU offloading and Clip chunk size options in future? otherwise tweaking the defaults to lower vRAM requirements?

— Reply to this email directly, view it on GitHub https://github.com/theovercomer8/captionr/issues/9#issuecomment-1436247451, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5HWTBIQMA26YW4L7BGTP63WYLNPTANCNFSM6AAAAAAVA4XCG4 . You are receiving this because you commented.Message ID: @.***>

mediocreatmybest commented 1 year ago

Awesome, Thanks! Just tried that now. Looks like I'm getting another issue with tensor sizes if I drop the BLIP Beams down to anything under 40. 40 and above I'm getting out of memory, below 40 I'm getting this.


attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (39) must match the size of tensor b (1521) at non-singleton dimension 0

I can get it to successfully run with Blip beams of 1 though, captions are obviously a little wonky at this setting. So I'm unsure if that's all just a memory issue or something else.

mediocreatmybest commented 1 year ago

As the original issue mentioned is fixed closing issue. I'll have another look at the other issue mentioned with number of BLIP Beams once I am able to get BLIP working within the memory constraints.