etali commented 2 years ago

When I run the code in README.md, it finally shown that "AttributeError: 'XClipAdapter' object has no attribute 'max_text_len'". Any hint?

lucidrains commented 2 years ago

@etali hey! which script in the readme are you running?

lucidrains commented 2 years ago

seems to be working for me, but feel free to reopen the issue with the script that reproduces the error

etali commented 2 years ago

I run the script described here.

etali commented 2 years ago

import torch from dalle2_pytorch import DALLE2, DiffusionPriorNetwork, DiffusionPrior, Unet, Decoder, CLIP

clip = CLIP( dim_text = 512, dim_image = 512, dim_latent = 512, num_text_tokens = 49408, text_enc_depth = 6, text_seq_len = 256, text_heads = 8, visual_enc_depth = 6, visual_image_size = 256, visual_patch_size = 32, visual_heads = 8 ).cuda()

mock data

text = torch.randint(0, 49408, (4, 256)).cuda() images = torch.randn(4, 3, 256, 256).cuda()

train

loss = clip( text, images, return_loss = True )

loss.backward()

do above for many steps ...

prior networks (with transformer)

prior_network = DiffusionPriorNetwork( dim = 512, depth = 6, dim_head = 64, heads = 8 ).cuda()

diffusion_prior = DiffusionPrior( net = prior_network, clip = clip, timesteps = 100, cond_drop_prob = 0.2 ).cuda()

loss = diffusion_prior(text, images) loss.backward()

do above for many steps ...

decoder (with unet)

unet1 = Unet( dim = 128, image_embed_dim = 512, cond_dim = 128, channels = 3, dim_mults=(1, 2, 4, 8) ).cuda()

unet2 = Unet( dim = 16, image_embed_dim = 512, cond_dim = 128, channels = 3, dim_mults = (1, 2, 4, 8, 16) ).cuda()

decoder = Decoder( unet = (unet1, unet2), image_sizes = (128, 256), clip = clip, timesteps = 100, image_cond_drop_prob = 0.1, text_cond_drop_prob = 0.5, condition_on_text_encodings = False # set this to True if you wish to condition on text during training and sampling ).cuda()

for unet_number in (1, 2): loss = decoder(images, unet_number = unet_number) # this can optionally be decoder(images, text) if you wish to condition on the text encodings as well, though it was hinted in the paper it didn't do much loss.backward()

do above for many steps

dalle2 = DALLE2( prior = diffusion_prior, decoder = decoder )

images = dalle2( ['cute puppy chasing after a squirrel'], cond_scale = 2. # classifier free guidance strength (> 1 would strengthen the condition) )

save your image (in this example, of size 256x256)

1073521013 commented 2 years ago

same error in same place

lucidrains / DALLE2-pytorch

AttributeError: 'XClipAdapter' object has no attribute 'max_text_len' #99

mock data

train

do above for many steps ...

prior networks (with transformer)

do above for many steps ...

decoder (with unet)

do above for many steps

save your image (in this example, of size 256x256)