Flash attention 2 not working, "Moondream does not support Flash Attention 2.0 yet."

ProGamerGov commented 6 months ago

    model = AutoModelForCausalLM.from_pretrained(
  File "/home/user/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
    return model_class.from_pretrained(
  File "/home/user/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3456, in from_pretrained
    config = cls._autoset_attn_implementation(
  File "/home/user/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/home/user/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1382, in _check_and_enable_flash_attn_2
    raise ValueError(
ValueError: Moondream does not support Flash Attention 2.0 yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new

vikhyat commented 6 months ago

Can you check if you're using the 2024-03-06 revision? Previous ones don't support it.

ProGamerGov commented 6 months ago

@vikhyat The batch processing from 2024-03-05 with 2024-03-06 seems to be broken, but flash attention seems to work in 06

ProGamerGov commented 6 months ago

Using revision 2024-03-06 results in this:

Traceback (most recent call last):
  File "moondream_batch.py", line 201, in run_model
    captions = model.answer_question(enc_images, [query_prompt] * len(batch_images), tokenizer)
  File "/home/user/.cache/huggingface/modules/transformers_modules/vikhyatk/moondream2/c064b4c517bbdbbad6351b1959c90b79858822be/moondream.py", line 95, in answer_question
    answer = self.generate(
  File "/home/user/.cache/huggingface/modules/transformers_modules/vikhyatk/moondream2/c064b4c517bbdbbad6351b1959c90b79858822be/moondream.py", line 78, in generate
    inputs_embeds = self.input_embeds(prompt, image_embeds, tokenizer)
  File "/home/user/.cache/huggingface/modules/transformers_modules/vikhyatk/moondream2/c064b4c517bbdbbad6351b1959c90b79858822be/moondream.py", line 56, in input_embeds
    return torch.cat(embeds, dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 24 for tensor number 2 in the list.

vikhyat commented 6 months ago

answer_question only supports a single image, there's a new batch_answer API for batch generation:

answers = moondream.batch_answer(
    images=[Image.open('<IMAGE_PATH_1>'), Image.open('<IMAGE_PATH_2>')],
    prompts=["Describe this image.", "Are there people in this image?"],
    tokenizer=tokenizer,
)

ProGamerGov commented 6 months ago

Thanks! Now, I'm trying the new function and it seems to be resulting a new error message

    captions = model.batch_answer(enc_images, [query_prompt] * len(batch_images), tokenizer)
  File "/home/user/.cache/huggingface/modules/transformers_modules/vikhyatk/moondream2/c064b4c517bbdbbad6351b1959c90b79858822be/moondream.py", line 120, in batch_answer
    image_embeds = self.encode_image(images)
  File "/home/user/.cache/huggingface/modules/transformers_modules/vikhyatk/moondream2/c064b4c517bbdbbad6351b1959c90b79858822be/moondream.py", line 31, in encode_image
    return self.vision_encoder(image)
  File "/home/user/.cache/huggingface/modules/transformers_modules/vikhyatk/moondream2/c064b4c517bbdbbad6351b1959c90b79858822be/vision_encoder.py", line 130, in __call__
    [self.preprocess(image.convert("RGB")) for image in images]
  File "/home/user/.cache/huggingface/modules/transformers_modules/vikhyatk/moondream2/c064b4c517bbdbbad6351b1959c90b79858822be/vision_encoder.py", line 130, in <listcomp>
    [self.preprocess(image.convert("RGB")) for image in images]
AttributeError: 'Tensor' object has no attribute 'convert'

Edit: Wait, I see whats happening now, you merged encode_image with batch_answer

vikhyat commented 6 months ago

Is enc_images a list of PIL Images? It looks like you might have already run them through the image encoder?

ProGamerGov commented 6 months ago

@vikhyat that was the output from model.encode_image, but I see that functionality is built into batch_answer now. Everything is working now, thanks for the help!

vikhyat commented 6 months ago

Nice! This is good feedback, I'm going to improve the error messages.

zephirusgit commented 5 months ago

I do not know what the attention flash is, although it works very fast the same, in version 2, I have added it like that. (in what they mention about the version) I don't know if it's okay as he adds it.

 Model_id = "vikhyatk/moondream2"
 REVIEW = "2024-03-13"
 Tokenizer = Tokenizer.from_pretrained (Model_id)
 moondream = moondream.from_pretrained (model_id) .to (device = device, dtype = dtype)
 Attn_implementation = "Flash_attenion_2"
 Mondream.Eval ()

 I cannot paste the entire code because I would confuse more, I use the Mondream to see an image, and the answer translates it into Spanish, then I use it for other things.

vikhyat / moondream

Flash attention 2 not working, "Moondream does not support Flash Attention 2.0 yet." #63