Closed cuttle-fish-my closed 1 year ago
I think it is because this is a pixel-wise self attention implementation, which considered one pixel as a token. This implementation will be extremely memory comsuming when input resolution is relatively high (e.g. 256^2).
Hi! Thanks for this fantastic work!
I am little confused about the
forward
function ofAttentionBlock
inunet.py
. The corresponding codes are shown as blew:I just wonder why we should hardcode the
flag
parameter toTrue
instead of usingself.use_checkpoints
?Thanks!