Closed jameshball closed 2 years ago
I suspect this is an issue with the library expecting 3 colours in the training images but not getting this because MNIST is only greyscale
I fixed this error by creating my own version of imagen_pytorch.data.Dataset
that converts the image to grayscale (which it already is) with 3 channels so that the shape is [3, 28, 28] instead.
The new Dataset
class:
from pathlib import Path
from functools import partial
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms as T, utils
from PIL import Image
# helpers functions
def exists(val):
return val is not None
def cycle(dl):
while True:
for data in dl:
yield data
def convert_image_to(img_type, image):
if image.mode != img_type:
return image.convert(img_type)
return image
# dataset and dataloader
class Dataset(Dataset):
def __init__(
self,
folder,
image_size,
exts=['jpg', 'jpeg', 'png', 'tiff'],
convert_image_to_type=None
):
super().__init__()
self.folder = folder
self.image_size = image_size
self.paths = [p for ext in exts for p in Path(f'{folder}').glob(f'**/*.{ext}')]
convert_fn = partial(convert_image_to, convert_image_to_type) if exists(
convert_image_to_type) else nn.Identity()
self.transform = T.Compose([
T.Lambda(convert_fn),
T.Resize(image_size),
T.RandomHorizontalFlip(),
T.CenterCrop(image_size),
# Added this so the shape is correct!
T.Grayscale(3),
T.ToTensor()
])
def __len__(self):
return len(self.paths)
def __getitem__(self, index):
path = self.paths[index]
img = Image.open(path)
return self.transform(img)
def get_images_dataloader(
folder,
*,
batch_size,
image_size,
shuffle=True,
cycle_dl=False,
pin_memory=True
):
ds = Dataset(folder, image_size)
dl = DataLoader(ds, batch_size=batch_size, shuffle=shuffle, pin_memory=pin_memory)
if cycle_dl:
dl = cycle(dl)
return dl
Also got the following error after I did this:
einops.EinopsError: Shape mismatch, can't divide axis of length 7 in chunks of 2
This is just because I was trying to train with 28x28 images rather than a power of 2. Changing the image size to 32x32 seems to have resolved this!
Hi,
I've copied the Dataloader example to test this on my machine and make sure everything works before properly using this library, but I'm getting the following error after making some small modifications and when using the MNIST dataset:
This is the code I'm using which is only slightly modified to change the image size:
I'm running this on CUDA with an NVIDIA GeForce RTX 3060 Laptop GPU on Windows 11. Please let me know if there's any extra info I can provide!
Thanks.