junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
22.95k stars 6.31k forks source link

Training on rectangle images #325

Closed mhusseinsh closed 6 years ago

mhusseinsh commented 6 years ago

Hello, I am starting using the code to train on my own dataset. I have images of size (800x600), any tips what to do I need to edit inside the code/options, to fit my dataset ?

mhusseinsh commented 6 years ago

So if you could please tell me what is your opinion about training rectangle images ? Should I feed the network without resizing or cropping ? or train on cropped square patches and then test on the whole image ? what is your idea about such operation ?

mhusseinsh commented 6 years ago

@junyanz @taesung89

junyanz commented 6 years ago

Yes. You can crop the patches (e.g., 360x360) without resizing the original image. You can use --resize_or_crop crop. During the test time, you can apply the model to the full image. Please check out the Preprocessing instruction in the training/test details.

mhusseinsh commented 6 years ago

so this is your recommendation for training on rectangle images ?

junyanz commented 6 years ago

Yes.

didirus commented 6 years ago

@junyanz, what do you mean with

During the test time, you can apply the model to the full image.

How would you do this if you want the original resolution of the image?

mhusseinsh commented 6 years ago

@didirus The original resolution will be preserved. During training, you can train do augmentation by resizing and random cropping. Since the network is fully convolutional, this means that you can test on any size of the image, and it won't be affected.

To elaborate, assume that you have an image of 800x600, you can train on square patches of 256x26, but you set the--resize_or_crop none, during testing, so the full image (original resolution) will be translated normally based on your saved weights/model

paviddavid commented 5 years ago

@mhusseinsh @didirus @junyanz I am not sure if I got it correct. If I want to train the CycleGAN to translate e.g. images of 256x512 (in test mode). My input data is also 256x512. So, I should train the model with, for example, load_size=270x270 and fine_size=256x256. Furthermore, I have to add the --resize_or_crop crop? Is this correct? No other changes? If this is the case, could you maybe add a short explanation how the crop parameters works in this case?

I appreciate any help or hints on this topic since I am also struggling to achieve good results. For the moment, I only get quadratic images.

Thanks a lot in advance.

junyanz commented 5 years ago

I recommend that you use --resize_or_crop crop and set both load_size and fine_size as 256. During test time, you can use --resize_or_crop none. @taesungp what is your recommendation?

paviddavid commented 5 years ago

@junyanz @taesungp I tried it as described above. However, the output image are still quadratic. Any ideas how to solve it? The train and test images are size 512x256 and the translated images should be the same image size and not quadratic..

junyanz commented 5 years ago

Did you use --preprocess none during test time? (Note: --resize_or_crop has been changed to --preprocess)

paviddavid commented 5 years ago

@junyanz Ah I got confused because in the repo (here on Github) it is described as preprocess and in your docker image I use, there is still resize_or_crop (maybe you should update it). Now, the results have the desired resolution, I think I added the --fineSize and --loadSize parameters during testing by mistake. However, I am not sure how to interpret the results.

I get the following html document: image

Left three images represent the domain A, the most right images represent domain B.

I am only interested in the translation from B to A. Do you provide a way how to save the translated images into separate folders or (at least) that the image names are preserved? As you can see in the attached image, only the image names from domain B are set. Therefore, it is hard to separate the images (in my case, I translate 10k images).

Another point particular on this use case scenario: 1) Do you recommend a special hyperparameter configuration for the translation of road scene images (Cityscapes vs Berkeley Deep Drive)? Learning rate? Number of epochs if learning starts from scratch? Any experiences? 2) Image line 2 or 3: Why is the night scenario preserved within the translation from domain B (Berkeley contains night images) to domain A (Cityscapes does not contain night images) but the recovered image does not show a night scene? Any hints? 3) How to check when the model did converge (when the training is finished)?

A lot of open questions but maybe you have some experience and can give some remarks/hints. Thanks a lot in advance.

junyanz commented 5 years ago

0) Maybe you can modify this function in this line. You can rewrite the get_image_paths. 1) We have a recent paper related to your application. See Sec 6.1.2 for the parameter configuration. 2) I am not sure why. 3) You can evaluate your model using some (conditional) GANs metrics or downstream tasks. See the original cyclegan paper and this paper for more details.

paviddavid commented 5 years ago

@junyanz Thanks for your fast response and your ideas! All in all, very helpful.

However, I am still unsure how to interpret these results. I assume real_A and real_B are the original images, fake_B and fake_A tha translated images in the other domain and rec_A and rec_B the recovered images (back-translated from the translation), right?

The translations look all in all very good, without errors (even for the night images). That's also a point I do not get - how can the translator generate night images if domain B does not contain night images? Furthermore, the original image seems to be very similar to the translated image, but the back translated image looks completely different.

Earlier, I tried to use a tensorflow implementation and I got really poor results (the orignal night images were completely messed up with random artefacts) and as already mentioned your results look very similar (like nothing has changed). How can you explain it?

junyanz commented 5 years ago

I am not sure. It might be possible that you accidentally flipped the directions of the models. Could you check the --direction flag and check if it is the same during training and test.

paviddavid commented 5 years ago

In both cases, I used the default settings AtoB Should I stop training and continue from the last checkpoint with the direction BtoA, or do I have to restart the training?

I am confused about the results of the images, it just seems to be not correct and I do not get why. As you can see in the attached screenshot (night scenarios), the recovered image looks different than the translated image which is not reproducible for me....

@junyanz

junyanz commented 5 years ago

You should use the same setting. I am not sure what happened in your case.

Marion2 commented 5 years ago

With CycleGAN model I tried to add --resize_or_crop none for test.py to have a rectangular output but I have this error test.py: error: unrecognized arguments: --resize_or_crop none Should I do something different for Cycle GAN ?

junyanz commented 5 years ago

The resize_or_crop flag has been renamed to preprocess.

taggyhan commented 1 year ago

Hi, i am using rectangular images. Have heeded your advice and trained my model using --preprocess crop, in my test.py i have used --preprocess None but encounter this error : RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 5 but got size 4 for tensor number 1 in the list.

I have a feeling its because my input data isn't in powers of 2 ( 355 x750 ) . is this why?