Size different from 128

lucidrains / denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

MIT License

7.81k stars 983 forks source link

Size different from 128 #66

Open ethancohen123 opened 2 years ago

ethancohen123 commented 2 years ago

Hi, I am trying to feed as argument image size=300 but I'm getting this error (my images are indeed of size 300) assert h == img_size and w == img_size, f'height and width of image must be {img_size}' AssertionError: height and width of image must be 300 When I put image_size=128 everything works just fine, is there a way to run the model and training with size different than 128 ? Thank you

lucidrains commented 2 years ago

@ethancohen123 Hi Ethan, can you paste your script?

ethancohen123 commented 2 years ago

Hi, here is my script (I followed the instruction in the readme.) When using image_size=128 it works fine, otherwise it does not. import torch from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer

model = Unet( dim = 64, dim_mults = (1, 2, 4, 8) ).cuda()

diffusion = GaussianDiffusion( model, image_size = 300, timesteps = 1000, # number of steps loss_type = 'l1' # L1 or L2 ).cuda()

trainer = Trainer( diffusion, '/projects/synsight/ethan/corona/corona_png/', train_batch_size = 32, train_lr = 2e-5, train_num_steps = 700000, # total training steps gradient_accumulate_every = 2, # gradient accumulation steps ema_decay = 0.995, # exponential moving average decay amp = True # turn on mixed precision )

trainer.train()

lucidrains commented 2 years ago

@ethancohen123 are you sure you are using the latest version?

ethancohen123 commented 2 years ago

Sorry bout that, I was using an older version, thanks. I now can generate higher size but 300 doesnt seems to work but that okay size 256 is fine error

An other not related question, is conditional generation supported? Thanks a lot

lucidrains commented 2 years ago

@ethancohen123 yup, you'll want to use 384 rather than 300

do you wish to condition on image or text? if you want text, i recommend https://github.com/lucidrains/imagen-pytorch

edit: actually imagen also supports image conditioning now

ethancohen123 commented 2 years ago

Actually I want to get something as in High-Resolution Image Synthesis with Latent Diffusion Models since I want to be able to encode from any source of data. I guess denoising diffusion is not the way to go then aha

lucidrains commented 2 years ago

@ethancohen123 no you can still use this library

afaik, latent diffusion is just denoising diffusion with predict x0 objective

ethancohen123 commented 2 years ago

oh thats awesome ! any idea how I can do that ? If I have a condition x and I create an encoder T(x) and I want to generate images from this how can I do that ? Thanks a lot !

lucidrains commented 2 years ago

@ethancohen123 i'd recommend https://github.com/lucidrains/dalle2-pytorch i actually have latent diffusion built in as an option, though it is untested

most of the code is there though

ethancohen123 commented 2 years ago

yes i saw this repo ( which is awesome btw ) but only thing is that I cannot do any clip training as I dont have text conditionning but features condition. Is there any way to use it without doing anything related to CLIP and just feed the diffusion with the condition ? Thanks for your time again