Open BIG-PIE-MILK-COW opened 2 years ago
I'd like to ask the same question.
I was going to use the Laion-400 for training experiments, but found it was too large, I have only an NVIDIA GeForce RTX 3060. Then I found Flickr30k, The size of the picture in Flickr30k is arbitrary, but Imagen-pytorch seems to require the same length and width of the image, and the same size of different images
@lucidrains looking forward to your reply
@lucidrains Can I simply use this to resize the image? Wouldn't that be too rude?
transform=torchvision.transforms.Compose([
torchvision.transforms.Resize(64, 64)
]))
@BIG-PIE-MILK-COW maybe u can take a look at Flickr30k, So little data probably won't give good results, but start with it maybe not a bad idea, It looks like we've all just started training this model and would love to communicate training experience with you
@lucidrains Can I simply use this to resize the image? Wouldn't that be too rude?
transform=torchvision.transforms.Compose([ torchvision.transforms.Resize(64, 64) ]))
@BIG-PIE-MILK-COW maybe u can take a look at Flickr30k, So little data probably won't give good results, but start with it maybe not a bad idea, It looks like we've all just started training this model and would love to communicate training experience with you
I have tried training with laion-art which is a subset of laion5B, but didn't get a good result.
I have tried training with laion-art which is a subset of laion5B, but didn't get a good result.
How many steps did you train for? How do your Unets look like?
I have tried training with laion-art which is a subset of laion5B, but didn't get a good result.
How many steps did you train for? How do your Unets look like?
I trained for 200000 steps, I use only one unet. Here is my uet: unet = Unet( dim=32, cond_dim=512, dim_mults=(1, 2, 4, 8), num_resnet_blocks=3, layer_attns=(False, True, True, True), layer_cross_attns=(False, True, True, True) )
you can't expect miracles being frugal on data
How should the dataset be loaded?
Is there any text-image dataset that is not so large(containing less than one hundred thousand pairs)?