lucidrains / denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch
MIT License
8.08k stars 1.01k forks source link

Trouble Getting Module To Work #179

Open WilliamJudge94 opened 1 year ago

WilliamJudge94 commented 1 year ago

Hello,

I am trying to use this python module to either denoise or recreate the input data. I have the following code to try and train your model to denoise a few images coming from the CIFAR10 dataset just to make sure I can carry out this basic task. Yet I am not getting my expected results.

import torch
import torchvision
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import TensorDataset, DataLoader
from denoising_diffusion_pytorch import Unet, GaussianDiffusion

# location of the CIFAR10 dataset
data = iter(torchvision.datasets.CIFAR10("/local/data/williamjudge/images/",))

# Setting the device to use
device=torch.device('cuda:0')

# Just collecting 2 images from the dataset, normalizing from 0-1
d = []
for i in range(2):
    images, labels = next(data)
    images = np.asarray(images)
    images = np.swapaxes(images[:, :, :], 0, -1)
    images = images - images.min()
    images = images / images.max()
    d.append(list(images))

# Creating a model
model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8)
).to(device)

# Using GaussianDiffusion
diffusion = GaussianDiffusion(
    model,
    image_size = 32,
    timesteps = 20000,   # number of steps
    loss_type = 'l1',    # L1 or L2
    objective='pred_v',
).to(device)

# Placing the data within a torch.Tensor
training_images = np.asarray(d)
training_images = torch.Tensor(training_images).to(device)

loss = diffusion(training_images.cuda())
loss.backward()
# after a lot of training

# Confusion
sampled_images = diffusion.sample(batch_size = 4)
sampled_images.shape # (4, 3, 128, 128)

# Viewing Data
plt.imshow(training_images[0][0].cpu())
plt.show()

plt.imshow(sampled_images[0][0].cpu())
plt.show()

Since the training_images and the sampled_images do not come close to matching, even after setting timesteps to 20,000, this has lead to confusion about a couple of areas:

  1. is the diffusion.sample() training the model and returning the predictions on the training data?
  2. if I want a model to recreate the input data I am assuming that the objective should = 'pred_x0'. Is this assumption correct?
  3. if I want a model to predict the noise in the data I am assuming that the objective should = 'pred_noise'. Is this assumption correct?
  4. if diffusion.sample() is training the model and returning predictions, how can I use the trained model to just output predictions without the training step?
Adrian744 commented 1 year ago
  1. No, diffusion.sample(X) will simply generate X outputs based on the trained model.
  2. No. The objective is simply a learning strategy which will be used in the training process. The output itself is always a image which should look like a image from your dataset.
  3. See 2.
  4. see 1. What you need to do, is to use the trainer class as in the read.me described. It will make your life more easy.