lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
MIT License
11.15k stars 1.09k forks source link

Problem training the diffusion prior #246

Closed mikeogezi closed 2 years ago

mikeogezi commented 2 years ago

While attempting to train the diffusion prior (with train_diffusion_prior.py), I run into the following exception:

Traceback (most recent call last):
  File "train_diffusion_prior.py", line 770, in <module>
    main()
  File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "train_diffusion_prior.py", line 766, in main
    initialize_training(config_file, accelerator)
  File "train_diffusion_prior.py", line 668, in initialize_training
    trainer: DiffusionPriorTrainer = make_model(
  File "train_diffusion_prior.py", line 48, in make_model
    diffusion_prior = prior_config.create()
  File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/dalle2_pytorch/train_configs.py", line 181, in create
    return DiffusionPrior(net = diffusion_prior_network, clip = clip, **kwargs)
  File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/dalle2_pytorch/dalle2_pytorch.py", line 1174, in __init__
    assert not exists(clip) or clip.dim_latent == self.image_embed_dim, f'you passed in a CLIP to the diffusion prior with latent dimensions of {clip.dim_latent}, but your image embedding dimension (keyword image_embed_dim) for the DiffusionPrior was set to {self.image_embed_dim}'
AssertionError: you passed in a CLIP to the diffusion prior with latent dimensions of 512, but your image embedding dimension (keyword image_embed_dim) for the DiffusionPrior was set to 768

I've also run the following snippet to check if any CLIP model has 768-dimensional latents:

import clip
from dalle2_pytorch import DiffusionPrior, DiffusionPriorNetwork, OpenAIClipAdapter
[(m, OpenAIClipAdapter(m).dim_latent) for m in clip.available_models()]

The result is:

[('RN50', 512), ('RN101', 512), ('RN50x4', 512), ('RN50x16', 512), ('RN50x64', 512), ('ViT-B/32', 512), ('ViT-B/16', 512), ('ViT-L/14', 512), ('ViT-L/14@336px', 512)]

So, it looks like the models available are all 512 dimensional. It's important that my prior generates latents based on OpenAI CLIP. How do I get past this?

Versions: dalle2_pytorch: 1.10.6 clip: git+https://github.com/openai/CLIP.git@d50d76daa670286dd6cacf3bcd80b5e4823fc8e1

rom1504 commented 2 years ago

L/14 output dimension is 768

On Thu, Sep 29, 2022, 09:04 Michael Ogezi @.***> wrote:

While attempting to train the diffusion prior (with train_diffusion_prior.py), I run into the following exception:

Traceback (most recent call last): File "train_diffusion_prior.py", line 770, in main() File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 1128, in call return self.main(args, kwargs) File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(args, kwargs) File "train_diffusion_prior.py", line 766, in main initialize_training(config_file, accelerator) File "train_diffusion_prior.py", line 668, in initialize_training trainer: DiffusionPriorTrainer = make_model( File "train_diffusion_prior.py", line 48, in make_model diffusion_prior = prior_config.create() File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/dalle2_pytorch/train_configs.py", line 181, in create return DiffusionPrior(net = diffusion_prior_network, clip = clip, kwargs) File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/dalle2_pytorch/dalle2_pytorch.py", line 1174, in init assert not exists(clip) or clip.dim_latent == self.image_embed_dim, f'you passed in a CLIP to the diffusion prior with latent dimensions of {clip.dim_latent}, but your image embedding dimension (keyword image_embed_dim) for the DiffusionPrior was set to {self.image_embed_dim}' AssertionError: you passed in a CLIP to the diffusion prior with latent dimensions of 512, but your image embedding dimension (keyword image_embed_dim) for the DiffusionPrior was set to 768

I've also run the following snippet to check if any CLIP model has a 768-dimensional latents:

import clipfrom dalle2_pytorch import DiffusionPrior, DiffusionPriorNetwork, OpenAIClipAdapter [(m, OpenAIClipAdapter(m).dim_latent) for m in clip.available_models()]

The result is:

[('RN50', 512), ('RN101', 512), ('RN50x4', 512), ('RN50x16', 512), ('RN50x64', 512), ('ViT-B/32', 512), ('ViT-B/16', 512), ('ViT-L/14', 512), @.***', 512)]

So, it looks like the models available are all 512 dimensional. It's important that my prior generates latents based on OpenAI CLIP. How do I get past this?

Versions: dalle2_pytorch: 1.10.6 clip: git+ @.***

— Reply to this email directly, view it on GitHub https://github.com/lucidrains/DALLE2-pytorch/issues/246, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437VRHNDC7TAYW3SHL6DWAU5QRANCNFSM6AAAAAAQYORI5Y . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mikeogezi commented 2 years ago

@rom1504 ViT-L/14 is one of the models I tested and its dim_latent is 512. It says so in the snippet.

rom1504 commented 2 years ago

L/14 output dimension is definitely 768 dim_latent is probably returning the wrong thing

mikeogezi commented 2 years ago

Right: https://github.com/lucidrains/DALLE2-pytorch/blob/d0c11b30b081a26dc22fb7cdcb2c6750316acc27/dalle2_pytorch/dalle2_pytorch.py#L336

lucidrains commented 2 years ago

@mikeogezi Hi Michael! Thanks for surfacing this issue

Should be resolved at https://github.com/lucidrains/DALLE2-pytorch/commit/c18c0801283d30384912df0e35f225f3df1566a3

lucidrains commented 2 years ago

@mikeogezi also, plugging @rom1504 's new open clip model!

from dalle2_pytorch import OpenClipAdapter

clip = OpenClipAdapter('ViT-H/14')