microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Combining GAN and CNNs #3116

Open csolorio opened 6 years ago

csolorio commented 6 years ago

Hi. I have previously asked for your help for a couple of points to use the the tutorial 302B (https://cntk.ai/pythondocs/CNTK_302B_Image_Super-resolution_Using_CNNs_and_GANs.html) but using a different model (https://github.com/Microsoft/CNTK/issues/3078). The issues in that topic remain. I share with you a minimal working code for you to kindly help me once again:

The model I trained used the same variables (with the same name field).

def pad_block(block, out_size = 224):
    change = False
    add_rows = 0
    add_cols = 0

    if(block.shape[1] < out_size):
        change = True
        add_rows = (out_size - block.shape[1]) / 2

    if(block.shape[2] < out_size):
        change = True
        add_cols = (out_size - block.shape[2]) / 2

    if(change):
        #Padding witn cntk ops, using default constant padding with 0s
        return pad(block, pattern = [(0,0),(add_rows, add_rows),(add_cols, add_cols)], name = 'pad_block')
    else:
        return block

    #Data reader
train_reader    = create_datareader(train_datasource_csv, scale, patch_size)

#Variables
input_var = input_variable((num_channels, patch_size, patch_size), np.float32, name="input")
target_var = input_variable((num_channels, patch_size * scale, patch_size * scale), np.float32, name="target")

input_scaled = input_var/255
target_scaled = target_var/255

#0..255 values in image
model = load_model(preTrained)(input_var)

VGG19 = load_model(os.path.join(output_model_folder, "VGG19_ImageNet_Caffe.model"))
#Model cropping until relu5_4 layer
layer5_4 = VGG19.find_by_name('relu5_4')
VGG  = combine([layer5_4.owner])

#discriminator on real images: (0..1) output/input with 1x1 shape
D_real = discriminator(target_scaled)

#discriminator on generated images: (0..1) output/input with 1x1 shape
D_fake = D_real.clone(method = 'share', substitutions = {target_scaled.output: (model.output / 255)})

#VGG on real images: (0..255) output with 224 x 224 input shape
VGG_real = VGG.clone(method = 'share', substitutions = {VGG.arguments[0]: pad_block(target_var)})

#VGG on generated images: (0..255) output with 224 x 224 input shape
VGG_fake = VGG.clone(method = 'share', substitutions = {VGG.arguments[0]: pad_block(model.output)})

#generator loss: GAN loss + MSE loss + perceptual (VGG) loss
G_loss = -square(D_fake) * 0.001 + reduce_mean(square(target_scaled - (model / 255))) * 0.08 + reduce_mean(square(VGG_real - VGG_fake)) * 0.08

#discriminator loss: loss on real + loss on fake images
D_loss = square(1.0 - D_real) + square(D_fake)

#Optimizers for trainers
G_optim = adam(G_loss.parameters,
                lr = learning_parameter_schedule([(20, 0.0001), (20, 0.00001)], minibatch_size = 5000),
                momentum = momentum_schedule(0.9), gradient_clipping_threshold_per_sample = 0.1)

D_optim = adam(D_loss.parameters,
                lr = learning_parameter_schedule([(20, 0.0001), (20, 0.00001)], minibatch_size = 5000),
                momentum = momentum_schedule(0.9), gradient_clipping_threshold_per_sample = 0.1)

#Trainer creation
G_trainer = Trainer(model, (G_loss, None), G_optim)
D_trainer = Trainer(D_real, (D_loss, None), D_optim)

#training configuration
MINIBATCH_SIZE = 4
NUM_MINIBATCHES = 100000

for train_step in range(NUM_MINIBATCHES):
        low_res, high_res, current_batch_size = train_reader.next_minibatch(MINIBATCH_SIZE)
        generator_batch = {input_var: low_res, target_var : high_res}
        discriminator_batch = {input_var : low_res, target_var : high_res}

        D_trainer.train_minibatch(discriminator_batch)
        G_trainer.train_minibatch(generator_batch)

This code produces the following error: File "test.py", line 133, in D_trainer.train_minibatch(discriminator_batch) File "C:\ProgramData\Anaconda2\lib\site-packages\cntk\train\trainer.py", line 184, in train_minibatch device) File "C:\ProgramData\Anaconda2\lib\site-packages\cntk\cntk_py.py", line 2856, in train_minibatch return _cntk_py.Trainer_train_minibatch(self, *args) RuntimeError: AddNodeToNet: Duplicated name for Constant2614 LearnableParameter operation.

[CALL STACK]

Microsoft::MSR::CNTK::DataTransferer:: operator=

  • Microsoft::MSR::CNTK::DataTransferer:: operator= (x2)
  • CNTK::Internal:: UseSparseGradientAggregationInDataParallelSGD (x13)

If I use the same dictionary to provide the input batches to the trainers, I get the same error. If I use the following:

            generator_batch = {G_trainer.model.arguments[0]: low_res, target_var : high_res}
            discriminator_batch = {D_trainer.model.arguments[0] : low_res, target_var : high_res}

I get: File "test.py", line 133, in D_trainer.train_minibatch(discriminator_batch) File "C:\ProgramData\Anaconda2\lib\site-packages\cntk\train\trainer.py", line 184, in train_minibatch device) File "C:\ProgramData\Anaconda2\lib\site-packages\cntk\cntk_py.py", line 2856, in train_minibatch return _cntk_py.Trainer_train_minibatch(self, *args) ValueError: Values for 1 required arguments 'Input('input', [#], [3 x 48 x 48])', that the requested output(s) 'Output('aggregateLoss', [], []), Output('Plus6910_Output_0', [#], [1])' depend on, have not been provided.

[CALL STACK]

CNTK::Internal:: UseSparseGradientAggregationInDataParallelSGD

  • CNTK::Function:: Forward
  • CNTK:: CreateTrainer
  • CNTK::Trainer:: TotalNumberOfUnitsSeen
  • CNTK::Trainer:: TrainMinibatch (x2)
  • CreateDeserializer (x2)
  • PyCFunction_Call
  • PyEval_GetGlobals
  • PyEval_EvalFrameEx
  • PyEval_EvalCodeEx
  • PyEval_GetFuncDesc
  • PyEval_GetGlobals
  • PyEval_EvalFrameEx
  • PyEval_EvalCodeEx

Please Help! :(

ke1337 commented 6 years ago

Please check the entire list of *_loss.arguments and make sure their are supplied as inputs. Your error is that some arguments are not provided with value. Note that clone creates new input nodes which may not be what you intended. You may use logging.graph.plot to visualize the model and understand more about data feeding path.

csolorio commented 6 years ago

I have seen something weird that I suspect is the cause. When I check the argument list for the discriminator for 'fake' images after checking the loss arguments, the only one I get is: Input('Input3', [#], [3 x 48 x 48]) Nevertheless, the discriminator on real images is: Input('Input4', [#], [3 x 96 x 96])

I create the discriminator of fake images using: D_fake = D_real.clone(method = 'share', substitutions = {target_scaled.output: model.output}) The shape of model.output is "(3, 96, 96)". Could that be the source of the error? Why the discriminator of real images has that input shape if neither the original discriminator nor the model.output substitution has a differente shape? If I invert the key/value order in the substitution dictionary, would that work or would simply not substitute anything at all?

Thank you in advance!

jaliyae commented 6 years ago

Could you use the following function and plot the graph to see how it shares the parameters? def plot_graph(root, file_name): C.logging.graph.plot(root, file_name+".pdf")

tangyuq commented 6 years ago

Please note that the model.arguments ordering is not guaranteed. I would suggest you play with the following trick:

        input_var = input_variable((num_channels, patch_size, patch_size), np.float32, name="**input**")
        target_var = input_variable((num_channels, patch_size * scale, patch_size * scale), np.float32, name="**target**")

        generator_batch = {'input': low_res, 'target' : high_res}
        discriminator_batch = {arg: generator_batch[arg.name] for arg in D_trainer.model.arguments}

        discriminator_batch = {'input': low_res, 'target' : high_res}
        discriminator_batch = {arg: discriminator_batch[arg.name] for arg in D_trainer.model.arguments}
csolorio commented 6 years ago

I'll try to test the last trick. A question, though. The fourth line is correct? Assign that new dictionary using the generator batch to the discriminator batch? Three assignments to the discriminator batch is correct?

Anyway, I'll try. I'll also upload the plot if the trick doesn't work :)

csolorio commented 6 years ago

Update: The trick with the names didn't work. I still get the following error: ValueError: Values for 1 required arguments 'Input('input', [#], [3 x 48 x 48])', that the requested output(s) 'Output('aggregateLoss', [], []), Output('Plus5397_Output_0', [#], [1])' depend on, have not been provided.

I tested both with pairs of generator_batch and discriminator_batch lines, and as the original code. I tested the name with and without asterisks (I assumed it was to just to signal the change si I initially tested without it).

I'm currently testing ONLY the discriminator part to make the code even simpler but the error keeps appearing.

I've plotted also the loss of the discriminator function since plotting one model or the other doesn't show the whole thing. I attach the discriminator model, the generator model and the discriminator loss graph to compare them all (pdf files were generated with the upper part cropped so I saved them in png). d_loss d_loss_short discriminator

tangyuq commented 6 years ago

First of all, you will also need use the argument name tricks for clone as well (again because the ordering of C.Function.arguments are not guaranteed). You will need to replace the following with the named trick mentioned above if len(VGG.arguments) > 1 :

VGG on real images: (0..255) output with 224 x 224 input shape

VGG_real = VGG.clone(method = 'share', substitutions = {VGG.arguments[0]: pad_block(target_var)})

VGG on generated images: (0..255) output with 224 x 224 input shape

VGG_fake = VGG.clone(method = 'share', substitutions = {VGG.arguments[0]: pad_block(model.output)})

One suggestion to locate the input map problem is evaluating the function individual with fake input data. For example, if you believe that your function F is only with one input variable: x = C.input_variable((3, 48, 48)) you can do batch_size =1 or 2 F.eval({x: np.ones((batch_size, 3, 48, 48))})

If your function have missing input, this will give the error so that you can locate which function in your big graph has the missing input incrementally.