lucidrains / imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
MIT License
8.09k stars 768 forks source link

ElucidatedImagen no text conditionning #329

Closed axel588 closed 1 year ago

axel588 commented 1 year ago

Hello !

There is no text conditionning with elucidatedimagen, I get completly random training data using elucidatedimagen, but using text-imagen work with no problem, any solution ?

    imagen = ElucidatedImagen(
        unets = (unet),
        image_sizes = (16),
        cond_drop_prob = 0.1,
        text_encoder_name = 't5-base',
        channels=4,
        num_sample_steps = (64), # number of sample steps - 64 for base unet, 32 for upsampler (just an example, have no clue what the optimal values are)
        sigma_min = 0.002,           # min noise level
        sigma_max = (80),       # max noise level, @crowsonkb recommends double the max noise level for upsampler
        sigma_data = 0.5,            # standard deviation of data distribution
        rho = 7,                     # controls the sampling schedule
        P_mean = -1.2,               # mean of log-normal distribution from which noise is drawn for training
        P_std = 1.2,                 # standard deviation of log-normal distribution from which noise is drawn for training
        S_churn = 80,                # parameters for stochastic sampling - depends on dataset, Table 5 in apper
        S_tmin = 0.05,
        S_tmax = 50,
        S_noise = 1.003,
    ).cuda()
lucidrains commented 1 year ago

@axel588 it should work exactly the same as Imagen

just train it like you trained your text-conditioned imagen

axel588 commented 1 year ago

@axel588 it's exactly what I did, I just replaced the imagen code, but the Eluciated has very very very poor text conditionning, where imagen has much more conditionning (even if like I said in CLIP issue, t5 is not suited for text to image generation).