Open JonathanLi19 opened 9 months ago
In the Attention
module, we provide two different forward functions, namely xformers_forward
and torch_forward
. In our early work, we intend to only use xformers_forward
. However, we found xformers
sometimes doesn't work. For example, the dim of the tensors is not 2^n. Therefore, we use the scaled_dot_product
in torch instead of xformers. scaled_dot_product
is another implementation of flash attention, which is only supported in torch>=2.0. This implementation is fast and stable enough. If you are interested, you can see this document for more information.
Thank you very much!
Hi, sorry to bother you agian. I have two more questions.
Hi, Great work!! I have one question: in the paper, you said that "we adopt flash attention [6] in all attention layers, including the text encoder, UNet, VAE, ControlNet models, and motion modules". I found the xformers_forward() function in the Attention module. However, this function is never called during the whole process of "diffutoon_toon_shading.py". It is very strange, since it still can generate high resolution videos. I am very confused how does this work? Thanks!