Closed dimitribellini closed 5 days ago
Oh strange, they must enable it by default... I'll take a look soon and see if it can be disabled.
Thanks for the report!
Oh strange, they must enable it by default... I'll take a look soon and see if it can be disabled.
Thanks for the report!
Thanks so much!!! I'm very happy to be useful :-)
@matatonic Hi see a new release, have you find a soluion for FlashAttention? Thanks so much
Not yet sorry, been busy.
yeah I can understand! Don't worry :-) Keep in mind, I would like to make a video on my YT channel to present your solution, because I think it's great :-)
Thanks so much
I don't have a good way to test this, but based on the config.json
file for OpenGVLab/InternVL2-1B you may be able to disable flash_attn there.
{
...
"vision_config": {
...
"use_flash_attn": false
}
}
Now, this is not really advisable, and essentially corrupts the huggingface data, but this may work:
edit hf_home/hub/models--OpenGVLab--InternVL2-1B/snapshots/b631bf72a9a7aaf1329d3c523ea00df2854e2163/config.json
(or the latest snapshot folder)
Just to add an update, I've tried changing the config.json to disable flash_attn and also disable bfloat16 but it didn't work. Without changing their code, it looks like InternVL2 requires Ampere (CUDA 8.0) or greater.
Dear DevTeam, thanks so much for this great tool! During my test I found a big show stopper the "FlashAttention" option... In my setup I have two Nvidia RTX 8000 board and this board are from Turing family (TU102GL) and they not support "FlashAttention". Could be possible run the Vision models with this library?
I will add some more details: Used Model: "python vision.py -m OpenGVLab/InternVL2-1B --device-map cuda:0"
Logs:
I did not use the Flag "FlashAttention" but I still receive the error.
Thanks so much