Closed afiaka87 closed 3 years ago
Yep, it's following that first bad run almost verbatim.
@afiaka87 so interestingly enough, OpenAI's paper said that one conv like attention layer is necessary at the very last layer of DALL-E, but for now, you can just remove it altogether
cool. will do. keep me posted on a fix.
@afiaka87 yup, that pattern is a leak in the masking, where the past attends to the future by accident. it comes up again and again in autoregressive transformer training
til then, i'm gonna train the hell out of this thing.
will do!
Is there a suggested combination of attention we should use?
Also, OT, but does anyone have a good intuition about the values? I don't totally get dim size for dalle in relation to performance / compute. Does it have to be the size of the feature map?
@TheodoreGalanos You can't go wrong with all full attention if you can, especially when the image sequence is low, is my opinion. Dimensions, just keep it at 1024 if you can, 512 on a budget. Heads at 8 minimum, more if possible
Nope, it's completely different than the feature map dimensions of the VAE, if that is what you are referring to
Thank you @lucidrains !
@lucidrains so full is generally going to be better, but more compute heavy?
@afiaka87 One last side now. Since you've experimented with the vqgan, is it amenable to transfer learning? I don't want to necessarily train one today (I have a vqvae already) but would fine tune one if it was possible.
@afiaka87 One last side now. Since you've experimented with the vqgan, is it amenable to transfer learning? I don't want to necessarily train one today (I have a vqgan already) but would fine tune one if it was possible.
I just ran it for the first time like 3 hours ago ha. I've actually not successfully done transfer learning on any of these options. I just start over each time trying to make it better from scratch. No clue, unfortunately.
Oh okay! thanks, will try and train one then. I'm not entirely sure how easy it would be, I'll take a look.
If you were late to the party or (like me) need a refresher on what happened here, here is a graph of the live session we were all viewing. The relevant runs are runs 7 through 10
@lucidrains I believe the relevant line is here:
https://github.com/lucidrains/DALLE-pytorch/blob/2268864941d8eef2ba73a4488fe05673d447d493/dalle_pytorch/dalle_pytorch.py#L306
I tried adding it in myself, but it needs the taming imports and I'm not familiar with those.