Open CDitzel opened 3 years ago
@CDitzel ohh got it, even if this were an error, that would just mean you could compensate by adjusting kl_div_loss_weight
by some factor on an order of magnitude of?
@CDitzel but yes, I have noticed that with this loss present, the network doesn't learn that well :(
@CDitzel rumor has it that the author of DALL-E was asked about this loss, but didn't give any straight answers
I have a headache Phil. The math demands this term to be there but when it is present, the results are actually worse. Hate it...
@CDitzel ohh got it, even if this were an error, that would just mean you could compensate by adjusting
kl_div_loss_weight
by some factor on an order of magnitude of?
Im not sure. I keep getting kl losses below zero oO
Maybe we could write the Open-AI team and ask for a straight answer? Maybe they disclose the information within a secure two-person email conversation?
@CDitzel rumor has it that the author of DALL-E was asked about this loss, but didn't give any straight answers
Yeah i've seen the video of it. He's somewhat dodgy the moment he's asked about it. I can't attest to the rigor of the math, however.
KL must have been used as they mention an increasing weight parameter in the paper.
Still, I am trying, but I cant seem to figure out his e mail adress. On the paper it says
Aditya Ramesh <_@adityaramesh.com
so I tried Aditya_Ramesh@adityaramesh.com, Aditya.Ramesh@adityaramesh.com
but they dont exist...
Even without including the kl term, I am wondering if anyone else has observed the following:
If I train the dVAE circumventing the GumbelSoftmax, i.e. the last 1x1 conv encoder layer output is directly multiplied with the codebook, than reconstruction is almost immediately very good.
Whereas when gumbel is used in between those two steps, then the output becomes very blurry and not at all comparable in quality.
so did anyone find out his email address? I composed an email but dont know where to send it to
so did anyone find out his email address? I composed an email but dont know where to send it to
:shrug: nope. you really think he'll talk if OpenAI didnt want him to in the first place? Is it not sort of their MO to keep things just vague enough?
I have no idea what going on with OpenAI, but so far I dont think they are open to transparent research as their name would suggest...
https://github.com/lucidrains/DALLE-pytorch/blob/995bfe1789243cbc838943cdc748daab406aae3e/dalle_pytorch/dalle_pytorch.py#L195
I am fairly certain that this should instead read
logits = rearrange(logits, 'b n h w -> (b h w) n')
since we are summing over the latent dimension, i.e. the probs/encoder outputs and averaging over the obversations. i.e. every spatial dim separately and for every sample/batch. The docs are a little messy on this but from what I understand batchmean requires a reshaping in the sense that all examples are condensed into the batch dimension