Open Aslyamovt opened 5 years ago
UPD. I read some issues and examples and understood that the second way was better. By elimination method I found out that problem in Dropout layer. Could somebody explane how to resume train on networks with Dropout layer using GPU (I don't have this problem with CPU)?
Hello, i have some problems with countinuing training my pretrained model. I trying to restore training progress from checkpoint by 2 ways. If I use this:
I have "Values for 1 required arguments 'Input('features', [500], [finputAxis, #])', that the requested output(s) 'Output('aggregateLoss', [], []), Output('aggregateEvalMetric', [], []), Output('Block67_Output_0', [1], [#])' depend on, have not been provided." exception. If I try this:
I have "CURAND failure 201: (see curand.h & look for curandStatus or CURAND_STATUS_xxx) ; GPU=0 ; hostname=DEXP ; expr=curandGenerateUniformHelper(gpuRNGHandle->Generator(), Data(), GetNumElements())". This way worked when tried to continue training my previos 20-30 networks. I think problem could be somewhere in BatchNormalization(), because it is the only one unit I added into the model.
I use stack: CNTK 2.5.1 CUDA 9.0 cuDNN 7.4.1.5 VS 2015
Device NVIDIA GeForce 940M OS Win 8.1 I will be grateful for any help