Closed wtomin closed 5 months ago
Close #291 then to avoid merging as it passed reviews?
Close #291 then to avoid merging as it passed reviews?
Good suggestion!
is the generated image on sd1.5 + 910b good?
is the generated image on sd1.5 + 910b good?
Yes, the quality is good.
Good analysis of FA! But I find it amusing that FA can cause OOM, even though it is supposed to be more memory-efficient compared to vanilla attention.
The OOM caused by FA is not like the OOM caused by a large batch size. As indicated by flash attention in mindspore library, on 910A, it restricts to head dimension to less than 304, otherwise it will cause UB OOM. Although not 100% sure, I guess UB indicates some ultra high-speed NPU memory that is used for exchange data.
Another solution to fix the compatibility problem of flash-attention layer. It might be a better solution compared with #291 because using
nn.layers.flash_attention
supports head_dim that is not divisible by 16 by padding it to 16*N.I have tested
text_to_image.py
on:fa_max_head_dim
to 128 instead of 256 on 910A, otherwise there will be OOM error; This value can be passed throughv1-inference.yaml
.2 910B (ms 2.2.10.2023.1124) 2.1 sdv1.5 :heavy_check_mark: graph mode :heavy_check_mark: pynative mode 2.2 sdv2.0 :heavy_check_mark: graph mode :x: pynative mode
I think it is a weird bug, because graph mode has no such an error.