lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

non-square images? #153

Open xnghu opened 3 years ago

xnghu commented 3 years ago

i've been trying to see if i can modify the code to allow for generating images that are not square, as a project to help me learn more about these models/methods, and also because i just want to be able to do that. I'm having trouble however, curious if anyone could point me in the right direction about what would need to be changed / implemented to achieve this (if it is possible) ?

i don't completely understand everything that's going on on underneath some of the abstracted layers, so i'm attempting to learn by hacking, and then trying to follow error messages and see if i can make changes based on why they happen. so far i have added a image_height parameter and then use that in the SIREN model, and other places where image_width is used. currently i'm stuck at this section of the forward function, where it calls the interpolate helper in the if statement:

# create normalized random cutouts print(self.input_resolution) if self.do_cutout: image_pieces = [rand_cutout(out, size, center_bias=self.center_bias, center_focus=self.center_focus) for size in sizes] image_pieces = [interpolate(piece, self.input_resolution) for piece in image_pieces] else: image_pieces = [interpolate(out.clone(), self.input_resolution) for _ in sizes]

i get errors of this type:

` return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors) RuntimeError: Input and output sizes should be greater than 0, but got input (H: 0, W: 50) output (H: 224, W: 224) ' I'm not sure if Im even on the right track. any help would be so appreciated! in the mean time im going to work on trying to learn more about cutouts and how this is working exactly.