Error when training GAN-based model

csolorio commented 6 years ago

Hi,

I've been trying to refine the parameters of a model I have previously trained, based in the CNTk GAN tutorial. The loss of this model's (which I use as the generator) training is based in the loss of the discriminator model and a separate model that contributes to the loss (similar to the other CNTK GAN tutorial). But when I start the training (invoke train_minibatch method), the following error pops up:

ValueError: Values for 1 required arguments 'Input('Input3', [#], [3 x 48 x 48])', that the requested output(s) 'Output('aggregateLoss', [], []), Output('Plus6903_Output_0', [#], [1])' depend on, have not been provided.

The generator's input is effectively an image block of 3x48x48 shape. The discriminator inputs are both the generator's output and the labels.

` low_res, high_res, current_batch_size = train_datareader.next_minibatch(MINIBATCH_SIZE) batch_inps_X_Y = {input: feature_input, target : label_target}

D_trainer.train_minibatch(batch_inps_X_Y)`

Any suggestion?

ke1337 commented 6 years ago

Please make sure your feed dict uses the same input_variables as the model. The error shows there are a mismatch.

csolorio commented 6 years ago

Sorry for the late response. Easter got in the way... :P I use the same input variables all around. I did the following:

Created the input variables named as 'input' and 'target' once in the whole code. Their name field have the same names. (maybe the names are the issue? It's the only thing that I think differs from the tutorial I based this on).
Loaded the generator model from a previous training session I did and do "generator(input)"
Created the scaled versions of the variables 'input' and 'target' as tutorial indicates.
Created the discriminator and do "discriminator(target_scaled)"
Followed the same steps as CNTK Tutorial 302B for VGG creation.

I created my own datareader and I used like this:

low_res, high_res, current_batch_size = train_datareader.next_minibatch(MINIBATCH_SIZE)
batch_inps_X_Y = {input: low_res, target : high_res}

D_trainer.train_minibatch(batch_inps_X_Y)
pp_D.update_with_trainer(D_trainer)
D_trainer_loss = D_trainer.previous_minibatch_loss_average

G_trainer.train_minibatch(batch_inps_X_Y)
pp_G.update_with_trainer(G_trainer)
G_trainer_loss = G_trainer.previous_minibatch_loss_average

Any ideas?

csolorio commented 6 years ago

I'm still trying to fix the issue. After restructuring the way I call the functions to create/load models/trainers, now I encounter this error:

File "D:\Carlos_SRes\Python_model\ganModel.py", line 204, in train D_trainer.train_minibatch(train_batch) File "C:\ProgramData\Anaconda2\lib\site-packages\cntk\train\trainer.py", line 184, in train_minibatch device) File "C:\ProgramData\Anaconda2\lib\site-packages\cntk\cntk_py.py", line 2856, in train_minibatch return _cntk_py.Trainer_train_minibatch(self, *args) RuntimeError: AddNodeToNet: Duplicated name for Constant2614 LearnableParameter operation.

[CALL STACK]

Microsoft::MSR::CNTK::DataTransferer:: operator=

Microsoft::MSR::CNTK::DataTransferer:: operator= (x2)

CNTK::Internal:: UseSparseGradientAggregationInDataParallelSGD (x13)

Please help!

ke1337 commented 6 years ago

Quote from @n17s for the duplicated name issue:

This is a problem I have run into in the past. What happens is that there are two nodes in the network with the same unique ID. There are different ways of solving this. In my case the name clash was on two constants so I just cloned the network before the first evaluation so that every node gets a fresh ID. In your case where you have input variables causing the problem what you probably want is to replace the two (or more) input variables with the same ID with one input variable that the different parts of the network share as input. Again you can use clone but now use should provide a dictionary as a second argument whose keys are the input variables that are causing the error and the values are all references to the new shared input variable.

jaliyae commented 6 years ago

Let's close this one as we have another thread open for this issue.

microsoft / CNTK

Error when training GAN-based model #3078