Enabling Flash Attention completely breaks prompt following, there has to be a bug in the flash attention implementation.
I reproduced this with the simplest possible setup, two identical locally compiled versions of stablediffusion.cpp, once compiled with cmake .. -DSD_FLASH_ATTN=OFF and once compiled with cmake .. -DSD_FLASH_ATTN=ON.
These are the results for the prompt "cat" and "dog" with flash attention disabled:
And these are the results for the prompt "cat" and "dog" with flash attention enabled:
Enabling flash attention appears to lead to the prompt getting completely ignored and the model generating whatever it wants to generate when no prompt is supplied.
Apart from this issue, flash attention works well to successfully reduce the memory usage by ~400 MB and also increase the generation speed by ~1 second though, so it would be very useful if it would also generate correct images.
Enabling Flash Attention completely breaks prompt following, there has to be a bug in the flash attention implementation.
I reproduced this with the simplest possible setup, two identical locally compiled versions of stablediffusion.cpp, once compiled with
cmake .. -DSD_FLASH_ATTN=OFF
and once compiled withcmake .. -DSD_FLASH_ATTN=ON
.The exact commands I use are
./sd.exe -m S:\Downloads\v1-5-pruned-emaonly.safetensors -p "cat" --steps 10
and./sd.exe -m S:\Downloads\v1-5-pruned-emaonly.safetensors -p "cat" --steps 10
so it's always using the exact same seed.
These are the results for the prompt "cat" and "dog" with flash attention disabled:
And these are the results for the prompt "cat" and "dog" with flash attention enabled:
Enabling flash attention appears to lead to the prompt getting completely ignored and the model generating whatever it wants to generate when no prompt is supplied.
Apart from this issue, flash attention works well to successfully reduce the memory usage by ~400 MB and also increase the generation speed by ~1 second though, so it would be very useful if it would also generate correct images.