yajiemiao / pdnn

PDNN: A Python Toolkit for Deep Learning. http://www.cs.cmu.edu/~ymiao/pdnntk.html
Apache License 2.0
224 stars 105 forks source link

Finetuning the model #16

Closed hemmingstein closed 9 years ago

hemmingstein commented 9 years ago

Hello again, after fixing the learning rate problem, I struggle with the next one: I come to the "finetuning the model" step and then there is this error:

"Traceback (most recent call last): File "pdnn/cmds/run_CNN.py", line 93, in train_error = train_sgd(train_fn, cfg) File "pdnn/learning/sgd.py", line 72, in train_sgd train_error.append(train_fn(index=batch_index, learning_rate = learning_rate, momentum = momentum)) File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.py", line 606, in call storage_map=self.fn.storage_map) File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.py", line 595, in call outputs = self.fn() ValueError: total size of new array must be unchanged Apply node that caused the error: Reshape{4}(Subtensor{int64:int64:}.0, TensorConstant{[256 1 28 28]}) Inputs types: [TensorType(float64, matrix), TensorType(int64, vector)] Inputs shapes: [(256, 40), (4,)] Inputs strides: [(320, 8), (8,)] Inputs values: ['not shown', array([256, 1, 28, 28])]"

I'm a bit puzzled by this, can you please help me?

ghost commented 9 years ago

Could you paste the full command line you run?

hemmingstein commented 9 years ago

Yeah, here it comes (I added newlines for readablitiy):

python pdnn/cmds/run_CNN.py --train-data "train.pfile" --valid-data "dev.pfile" --conv-nnet-spec "1x28x28:20,5x5,p2x2:50,5x5,p2x2,f" --nnet-spec "512:10" --wdir ./ --l2-reg 0.0001 --lrate "C:0.125:20" --model-save-step 20 --param-output-file cnn.param --cfg-output-file cnn.cfg

ghost commented 9 years ago

I just tested the latest version on both GPUs and CPUs, and didn't see any problems alike.

For CNN, PDNN has the requirement that you cannot change the batch size after the fine-tuning function is compiled. My interpretation of the error message is that by default, the mini-batch size is set to 256. However, during execution, the batch size is interpreted as a value not equal to 256 anymore. The cause of this difference is beyond me though. My guess is it's due to your compiler, the same reason as your last post.

hemmingstein commented 9 years ago

Thanks anyway!

PatrickLaflamme commented 8 years ago

I had an error like yours, I discovered that there was an old nnet.tmp and training_state.tmp that was sitting in the same directory, that had different net dimensions. this is was cause the error. Simply deleting those files did the trick!

hemmingstein commented 8 years ago

Thanks, I'll try it.