Closed mikeogezi closed 2 years ago
L/14 output dimension is 768
On Thu, Sep 29, 2022, 09:04 Michael Ogezi @.***> wrote:
While attempting to train the diffusion prior (with train_diffusion_prior.py), I run into the following exception:
Traceback (most recent call last): File "train_diffusion_prior.py", line 770, in
main() File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 1128, in call return self.main(args, kwargs) File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(args, kwargs) File "train_diffusion_prior.py", line 766, in main initialize_training(config_file, accelerator) File "train_diffusion_prior.py", line 668, in initialize_training trainer: DiffusionPriorTrainer = make_model( File "train_diffusion_prior.py", line 48, in make_model diffusion_prior = prior_config.create() File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/dalle2_pytorch/train_configs.py", line 181, in create return DiffusionPrior(net = diffusion_prior_network, clip = clip, kwargs) File "/home/ogezi/miniconda3/envs/playground/lib/python3.8/site-packages/dalle2_pytorch/dalle2_pytorch.py", line 1174, in init assert not exists(clip) or clip.dim_latent == self.image_embed_dim, f'you passed in a CLIP to the diffusion prior with latent dimensions of {clip.dim_latent}, but your image embedding dimension (keyword image_embed_dim) for the DiffusionPrior was set to {self.image_embed_dim}' AssertionError: you passed in a CLIP to the diffusion prior with latent dimensions of 512, but your image embedding dimension (keyword image_embed_dim) for the DiffusionPrior was set to 768 I've also run the following snippet to check if any CLIP model has a 768-dimensional latents:
import clipfrom dalle2_pytorch import DiffusionPrior, DiffusionPriorNetwork, OpenAIClipAdapter [(m, OpenAIClipAdapter(m).dim_latent) for m in clip.available_models()]
The result is:
[('RN50', 512), ('RN101', 512), ('RN50x4', 512), ('RN50x16', 512), ('RN50x64', 512), ('ViT-B/32', 512), ('ViT-B/16', 512), ('ViT-L/14', 512), @.***', 512)]
So, it looks like the models available are all 512 dimensional. It's important that my prior generates latents based on OpenAI CLIP. How do I get past this?
Versions: dalle2_pytorch: 1.10.6 clip: git+ @.***
— Reply to this email directly, view it on GitHub https://github.com/lucidrains/DALLE2-pytorch/issues/246, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437VRHNDC7TAYW3SHL6DWAU5QRANCNFSM6AAAAAAQYORI5Y . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@rom1504 ViT-L/14 is one of the models I tested and its dim_latent
is 512. It says so in the snippet.
L/14 output dimension is definitely 768 dim_latent is probably returning the wrong thing
@mikeogezi Hi Michael! Thanks for surfacing this issue
Should be resolved at https://github.com/lucidrains/DALLE2-pytorch/commit/c18c0801283d30384912df0e35f225f3df1566a3
@mikeogezi also, plugging @rom1504 's new open clip model!
from dalle2_pytorch import OpenClipAdapter
clip = OpenClipAdapter('ViT-H/14')
While attempting to train the diffusion prior (with
train_diffusion_prior.py
), I run into the following exception:I've also run the following snippet to check if any CLIP model has 768-dimensional latents:
The result is:
So, it looks like the models available are all 512 dimensional. It's important that my prior generates latents based on OpenAI CLIP. How do I get past this?
Versions: dalle2_pytorch: 1.10.6 clip: git+https://github.com/openai/CLIP.git@d50d76daa670286dd6cacf3bcd80b5e4823fc8e1