matlab-deep-learning / pix2pix

Image to Image Translation Using Generative Adversarial Networks
Other
30 stars 12 forks source link

Training with inputs files type single with values between [0,1] #4

Closed abaydou1 closed 4 years ago

abaydou1 commented 4 years ago

It seems that the training with input files type single with values between [0,1] is not straightforward. The ARange needs to be adjusted in multiple locations other than options.

Where does the instance NormalizationLayer intervene ? The discriminator and generator networks do not have a sigmoid or tanh.

Can you clarify this issue ?

justinpinkney commented 4 years ago

I've previously trained on edge map images which have been 0-1 and this just requires changing the ARange to 1 and the InputChannels to 1 (if it's a single channel input image). Could you provide a few more details on what else you found that need to be adjusted to make your example work?

Currently the InstanceNormalizationLayer isn't actually used, the batch norm is used by default, and there isn't an obvious options to switch to instance norm (if you think this would be a useful option to add please let me know).

You're right about the lack of sigmoid or tanh layers in the d and g. This is because in 19b these aren't supported layers for use in dlnetwork. They are actually applied in the train.m function, see L117 and L121 + L124

Hope that helps clear some things up.

abaydou1 commented 4 years ago

Hey Justin,

Thanks for your reply. In the .mat files I had 1441441 grey value single images. As I said earlier, I used the same strategy that was used in the U-Net segmentation example by inserting the matread file into the Fcn at the image datastore.

I am trying to clarify further your code, and I had the following comments:

1- By only asking to set ARange, and BRange, you are assuming that the lower range of images is always 0. Since this is the case, why did you choose to use tanh in making a fake image in line 117, why not use sigmoid ?

2 - The normalization steps used in PairedImageDatastore are not straightforward (Lines 128 - 131). These operations might alter the values in case images are already normalized. I ended up by making aOut = aIn, and bOut = bIn. Also in my experience, I had to manually adjusted the sizes in this function (lines 34:37).

3 - I am puzzled about the line 129 for labels in train function. It seems to me that you add a series of ones to be used as labels in the dLoss. What is the idea of using labels here ?

4 - If theoretically this pix2pix is a variant of Conditional GAN, we should be looking at inserting labels or classes for each image.Do the labels mentioned in question 2 correspond to the conditional GAN labels and you have assumed it is always 1 ? If yes, why ? Up to my knowledge, other coding repositories have enabled the generator with 2 outputs. One output with sigmoid as you did, and another output with softmax to predict class labels when not fed into the network.

I will be grateful if you can clarify the points 3 and 4 further.

Thanks, Atallah

abaydou1 commented 4 years ago

Issue#3:

Ok, I see now that used the labels in order to use cross entropy instead of log and 1- log. I am still trying to figure out the Issue #4.

justinpinkney commented 4 years ago

Thanks for the detailed comments!

1 + 2 - Yes my assumption is that images go from 0 to some value (usually 1 or 255), so the ARange and BRange parameters allow you to tell the PairedImageDatastore what the maximum expected value is. This is because normalisation is done according to the original paper, i.e. scaling the images to lie in the range -1 to 1. It does not use any empirically calculated mean or min/max from the dataset, just the expected range of the images. As all the images are scaled -1 to 1 then tanh is the logical choice to apply to the output of the network.

3 - Your comment above is correct

4 - In pix2pix the conditioning is the input image. i.e the labels are the pixel values of the input image. For more details I'd recommend the original paper: https://arxiv.org/abs/1611.07004

PS I'm interested in what you meant by?

Also in my experience, I had to manually adjusted the sizes in this function (lines 34:37).

One problem that I could see is that the resizing/transformations are using some interpolation and anti-aliasing, so if your input images are actually labels then this will some level of artifacts.

abaydou1 commented 4 years ago

Hey Justin,

I will reply later with some details, but for now can you clarify the item 1+2 above. I am aware that you are scaling the output images to [-1;1]. Q1: Are you also rescaling the input images to [-1,1] ? If yes, Where ? Is it in line 34:37 of PairedImageDatastore ?

Thanks, Atallah

justinpinkney commented 4 years ago

The read method of PairedImageDatastore calls the normaliseImages method on line 103. This method is what does the scaling to [-1;1]

abaydou1 commented 4 years ago

All the issues on this thread has been resolved, except one thing. When I spoke about the manual adjustment, I related it to the fact that even if you specified your ARange as 1 (or any number different than 255) in the options at the beginning, the user need to be cautious and re-enter the range when he calls the function translate. function options = parseInputs(varargin) % Parse name value pair arguments parser = inputParser(); parser.addParameter("ExecutionEnvironment", "auto", ... @(x) ismember(x, ["auto", "cpu", "gpu"])); parser.addParameter("ARange", 1, ... @(x) validateattributes(x, "numeric", "positive"));

parser.parse(varargin{:});
options = parser.Results;

end

justinpinkney commented 4 years ago

Yes, good point, you need to be careful to specify the ARange in two places. I should at least make this more clear in the documentation, or perhaps make the saved model remember the original ARange setting. I'll create a separate issue for this.