lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
MIT License
5.55k stars 643 forks source link

('full', 'axial_row', 'axial_col', 'conv_like') works, but uses more memory than just 'full' #213

Closed afiaka87 closed 3 years ago

afiaka87 commented 3 years ago

# Original post by @kobiso https://github.com/lucidrains/DALLE-pytorch/discussions/131#discussioncomment-640446

Go give them a rocket emoji!

Attention type ('full', 'axial_row', 'axial_col', 'conv_like') works

Experimental setting

Computational cost

Training log

image

Results

image

Originally posted by @kobiso in https://github.com/lucidrains/DALLE-pytorch/discussions/131#discussioncomment-640446

lucidrains commented 3 years ago

@afiaka87 nice, glad to hear it is working! :)

afiaka87 commented 3 years ago

@kobiso @lucidrains @janEbert can any of you speak as to why the axial and conv-like attention seem to require so much more memory than using just 'full' on its own?

my understanding was that these layers operated on more efficient feature maps. but i may need to revisit the topic.

lucidrains commented 3 years ago

ohh I understand why, I'll get it fixed by Friday!

TheodoreGalanos commented 3 years ago

ohh I understand why, I'll get it fixed by Friday!

does that mean I should wait for training a new model (i.e. breaking changes?) or is it safe to do so :)

lucidrains commented 3 years ago

@TheodoreGalanos it won't be breaking! It'll simply be more memory efficient attention , train away :)

afiaka87 commented 3 years ago

Great work @lucidrains