microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Error when training GAN-based model #3078

Closed csolorio closed 6 years ago

csolorio commented 6 years ago

Hi,

I've been trying to refine the parameters of a model I have previously trained, based in the CNTk GAN tutorial. The loss of this model's (which I use as the generator) training is based in the loss of the discriminator model and a separate model that contributes to the loss (similar to the other CNTK GAN tutorial). But when I start the training (invoke train_minibatch method), the following error pops up:

ValueError: Values for 1 required arguments 'Input('Input3', [#], [3 x 48 x 48])', that the requested output(s) 'Output('aggregateLoss', [], []), Output('Plus6903_Output_0', [#], [1])' depend on, have not been provided.

The generator's input is effectively an image block of 3x48x48 shape. The discriminator inputs are both the generator's output and the labels.

` low_res, high_res, current_batch_size = train_datareader.next_minibatch(MINIBATCH_SIZE) batch_inps_X_Y = {input: feature_input, target : label_target}

D_trainer.train_minibatch(batch_inps_X_Y)`

Any suggestion?

ke1337 commented 6 years ago

Please make sure your feed dict uses the same input_variables as the model. The error shows there are a mismatch.

csolorio commented 6 years ago

Sorry for the late response. Easter got in the way... :P I use the same input variables all around. I did the following:

Any ideas?

csolorio commented 6 years ago

I'm still trying to fix the issue. After restructuring the way I call the functions to create/load models/trainers, now I encounter this error:

File "D:\Carlos_SRes\Python_model\ganModel.py", line 204, in train D_trainer.train_minibatch(train_batch) File "C:\ProgramData\Anaconda2\lib\site-packages\cntk\train\trainer.py", line 184, in train_minibatch device) File "C:\ProgramData\Anaconda2\lib\site-packages\cntk\cntk_py.py", line 2856, in train_minibatch return _cntk_py.Trainer_train_minibatch(self, *args) RuntimeError: AddNodeToNet: Duplicated name for Constant2614 LearnableParameter operation.

[CALL STACK]

Microsoft::MSR::CNTK::DataTransferer:: operator=

  • Microsoft::MSR::CNTK::DataTransferer:: operator= (x2)
  • CNTK::Internal:: UseSparseGradientAggregationInDataParallelSGD (x13)

Please help!

ke1337 commented 6 years ago

Quote from @n17s for the duplicated name issue:

This is a problem I have run into in the past. What happens is that there are two nodes in the network with the same unique ID. There are different ways of solving this. In my case the name clash was on two constants so I just cloned the network before the first evaluation so that every node gets a fresh ID. In your case where you have input variables causing the problem what you probably want is to replace the two (or more) input variables with the same ID with one input variable that the different parts of the network share as input. Again you can use clone but now use should provide a dictionary as a second argument whose keys are the input variables that are causing the error and the values are all references to the new shared input variable.

jaliyae commented 6 years ago

Let's close this one as we have another thread open for this issue.