showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
https://arxiv.org/abs/2408.12528
Apache License 2.0
1.04k stars 44 forks source link

Can Flash Attention be used? #7

Open wusize opened 3 months ago

wusize commented 3 months ago

Hi!

A big thanks for your impressive work! Since full attention and causal attention are both included, I am curious how you implemented such attention masks if flash attention is used.

Best regards

Sierkinhane commented 3 months ago

Hi, We will try to implement it. Here's a potential solution to your question https://github.com/showlab/Show-o/issues/8.