lucidrains / imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
MIT License
8.05k stars 762 forks source link

loss value #215

Open BIG-PIE-MILK-COW opened 2 years ago

BIG-PIE-MILK-COW commented 2 years ago

Does anyone have trained text-conditional model? What about the loss value? I have trained the model on laion-art dataset, and the loss value finally decrease to around 0.1. Is it normal? Here are the sampled pictures.

截屏2022-09-08 18 38 58
TheFusion21 commented 2 years ago

I've trained a model with about 13k pairs with 10k steps for each Unet and with a final loss of about 0.009 and get quiet the good text to image alignment

deepglugs commented 2 years ago

What cond_scale are you producing your samples with?

BIG-PIE-MILK-COW commented 2 years ago

What cond_scale are you producing your samples with?

3.

deepglugs commented 2 years ago

Try lower values... That said, text conditioning has seemed finicky in my experience.

BIG-PIE-MILK-COW commented 2 years ago

Try lower values... That said, text conditioning has seemed finicky in my experience.

Ok, thanks for your suggestion.

BIG-PIE-MILK-COW commented 2 years ago

Which dataset did you use? I guess that laion-art is too large, so I want to train on a smaller dataset

deepglugs commented 2 years ago

I'm training on danbooru-figures which is almost 900k images.

QinSY123 commented 1 year ago

Do you have a model trained now? I would like to ask you about the specific value of the Unet parameter. The results I get so far are not good.The images I've gotten so far are rather blurry. @TheFusion21

TheFusion21 commented 1 year ago

Do you have a model trained now? I would like to ask you about the specific value of the Unet parameter. The results I get so far are not good.The images I've gotten so far are rather blurry. @TheFusion21

unet1 = dict(
    dim = 384,
    cond_dim = 384,
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = 3,
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = (False, True, True, True),
    layer_cross_attns = (False, True, True, True),
    memory_efficient = False,
)
unet2 = dict(
    dim = 128,
    cond_dim = 128, 
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = (2, 4, 8, 8),
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = (False, False, False, True),
    layer_cross_attns = (False, False, False, True),
    memory_efficient = True,
)
unet3 = dict(
    dim = 128,
    cond_dim = 128,
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = (2, 4, 8, 8),
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = False,
    layer_cross_attns = (False, False, False, True),
    memory_efficient = True,
)

imagen = ImagenConfig(
    unets = [unet1, unet2, unet3],
    image_sizes = (64, 256, 1024),
    timesteps = 256,
    condition_on_text = True,
    cond_drop_prob = 0.1,
    random_crop_sizes = (None, 64, 256)
).create()

trainer = ImagenTrainer(
    imagen = imagen,
    lr = 1e-4,
    cosine_decay_max_steps = 1500000,
    warmup_steps = 7500
)

This is my configuration. Unet 1 should optimaly have 512 in dim but I had to reduce it for memory reasons.

QinSY123 commented 1 year ago
condition_on_text

Thanks!

QinSY123 commented 1 year ago

Do you have a model trained now? I would like to ask you about the specific value of the Unet parameter. The results I get so far are not good.The images I've gotten so far are rather blurry. @TheFusion21

unet1 = dict(
    dim = 384,
    cond_dim = 384,
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = 3,
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = (False, True, True, True),
    layer_cross_attns = (False, True, True, True),
    memory_efficient = False,
)
unet2 = dict(
    dim = 128,
    cond_dim = 128, 
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = (2, 4, 8, 8),
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = (False, False, False, True),
    layer_cross_attns = (False, False, False, True),
    memory_efficient = True,
)
unet3 = dict(
    dim = 128,
    cond_dim = 128,
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = (2, 4, 8, 8),
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = False,
    layer_cross_attns = (False, False, False, True),
    memory_efficient = True,
)

imagen = ImagenConfig(
    unets = [unet1, unet2, unet3],
    image_sizes = (64, 256, 1024),
    timesteps = 256,
    condition_on_text = True,
    cond_drop_prob = 0.1,
    random_crop_sizes = (None, 64, 256)
).create()

trainer = ImagenTrainer(
    imagen = imagen,
    lr = 1e-4,
    cosine_decay_max_steps = 1500000,
    warmup_steps = 7500
)

This is my configuration. Unet 1 should optimaly have 512 in dim but I had to reduce it for memory reasons.

I still have a question, should I train Unet1, Unet2, and Unet3 separately or update the parameters of all three networks in one step?

QinSY123 commented 1 year ago

@TheFusion21 I still have a question, should I train Unet1, Unet2, and Unet3 separately or update the parameters of all three networks in one step?

TheFusion21 commented 1 year ago

You can't train them together in one step

QinSY123 commented 1 year ago

You can't train them together in one step

All right. Thanks!

QinSY123 commented 1 year ago

@TheFusion21 Sorry to bother you again, but could you send me the checkpoint.pt of your training? I have sent an email to your email address.

TheFusion21 commented 1 year ago

@TheFusion21 Sorry to bother you again, but could you send me the checkpoint.pt of your training? I have sent an email to your email address.

Can't do it is trained on private data

QinSY123 commented 1 year ago

@TheFusion21 Sorry to bother you again, but could you send me the checkpoint.pt of your training? I have sent an email to your email address.

Can't do it is trained on private data

All right, Thanks

QinSY123 commented 1 year ago

@TheFusion21 When I use the command "imagen --model" to generate an image, it gives me the error "Command 'imagen' not found". Have you encountered the same problem?