kuprel / min-dalle

min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch
MIT License
3.48k stars 256 forks source link

YouTube video walk-through of this codebase #88

Open gordicaleksa opened 1 year ago

gordicaleksa commented 1 year ago

Hi @kuprel!

First of all awesome work, you made my job that much easier. :)

I created a YouTube video where I do a deep dive/walk-through of this repo.

Maybe someone finds it useful: https://youtu.be/x_8uHX5KngE

Hopefully it's ok to share it here in the form of an issue, do let me know!

kuprel commented 1 year ago

Wow this is great! I just added your video to the readme. You're right the clamping is unnecessary. It originally served to avoid a cryptic cuda runtime error. Later I implemented a more precise solution to limit the BART decoder to 2**14 tokens to match the VQGAN. I'm not sure why there's a mismatch in vocabulary counts. Also I didn't realize those are shared weights. There's probably a simpler solution here. Great video!

kuprel commented 1 year ago

I checked to see if the embedding weights in the BART decoder were the same weights as the embedding weights in the VQGAN detokenizer. It seems they are actually different. The BART decoder in Dalle Mega is embedding to 2048 dimensions and the VQGAN is embedding to 256 dimensions.

Screenshot 2022-07-31 at 12 25 13 PM