Open ftamburin opened 9 years ago
This issue is related to this line of code: https://github.com/yajiemiao/pdnn/blob/master/learning/sgd.py#L71.
batch_size = 256
, which is much larger than size of training data 3
, leads to train_sets.cur_frame_num / batch_size = 0
, leads to train_error = []
, then leads to numpy.mean([])
emits a warning, as you see.
In one sentence: the boundary condition is not handled correctly.
I fixed this issue in my pull request, only changed several lines of code.
Below is output by running your script after fixing this issue (added one extra option --lrate "C:0.1:10"
to stop it from running indefinitely).
[2015-12-12 10:42:00.854358] > ... building the model
[2015-12-12 10:42:00.864003] > ... getting the finetuning functions
[2015-12-12 10:42:02.142837] > ... finetuning the model
[2015-12-12 10:42:02.145008] > epoch 1, training error 66.666667 (%)
[2015-12-12 10:42:02.146348] > epoch 1, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.148447] > epoch 2, training error 33.333333 (%)
[2015-12-12 10:42:02.148744] > epoch 2, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.149959] > epoch 3, training error 33.333333 (%)
[2015-12-12 10:42:02.150215] > epoch 3, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.151403] > epoch 4, training error 33.333333 (%)
[2015-12-12 10:42:02.151596] > epoch 4, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.152745] > epoch 5, training error 33.333333 (%)
[2015-12-12 10:42:02.152934] > epoch 5, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.154048] > epoch 6, training error 33.333333 (%)
[2015-12-12 10:42:02.154237] > epoch 6, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.155377] > epoch 7, training error 33.333333 (%)
[2015-12-12 10:42:02.155566] > epoch 7, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.156708] > epoch 8, training error 33.333333 (%)
[2015-12-12 10:42:02.156894] > epoch 8, lrate 0.100000, validation error 0.000000 (%)
[2015-12-12 10:42:02.158023] > epoch 9, training error 0.000000 (%)
[2015-12-12 10:42:02.158214] > epoch 9, lrate 0.100000, validation error 0.000000 (%)
[2015-12-12 10:42:02.159442] > epoch 10, training error 0.000000 (%)
[2015-12-12 10:42:02.159636] > epoch 10, lrate 0.100000, validation error 0.000000 (%)
[2015-12-12 10:42:02.161165] > ... the final PDNN model parameter is dnn.mdl
[2015-12-12 10:42:02.161569] > ... the final PDNN model config is dnn.cfg
Hope it helps.
I have just cloned the pdnn package, verified that mnist/mnist_rbm examples work and I am trying to build some new examples in order to verify the pickle file creation before working on my real data. First of all I reproduced the example at page https://www.cs.cmu.edu/~ymiao/pdnntk/data.html writing the python script that create a sample file:
import cPickle, numpy, gzip feature = numpy.array([[0.2, 0.3, 0.5, 1.4], [1.3, 2.1, 0.3, 0.1], [0.3, 0.5, 0.5, 1.4]], dtype = 'float32') label = numpy.array([2, 0, 1]) with gzip.open('filename.pkl.gz', 'wb') as f: cPickle.dump((feature, label), f)
The creation process was fine, but, when I tried to run a simple DNN training using the script
!/bin/bash
two variables you need to set
pdnndir=/home/guest-fac/tamburin/pdnn # pointer to PDNN device=cpu # the device to be used. set it to "cpu" if you don't have GPUs
export environment variables
export PYTHONPATH=$PYTHONPATH:$pdnndir export THEANO_FLAGS=mode=FAST_RUN,device=$device,floatX=float32
rm *.tmp
TRAIN DNN
python $pdnndir/cmds/run_DNN.py --train-data "filename.pkl.gz" --valid-data "filename.pkl.gz" --nnet-spec "4:5:3" --wdir ./ --param-output-file dnn.mdl --cfg-output-file dnn.cfg
I get the following output:
[2015-11-10 13:20:47.589817] > ... building the model [2015-11-10 13:20:47.603441] > ... getting the finetuning functions [2015-11-10 13:20:48.612798] > ... finetuning the model /usr/lib/python2.7/dist-packages/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice. warnings.warn("Mean of empty slice.", RuntimeWarning) [2015-11-10 13:20:48.614276] > epoch 1, training error nan (%) [2015-11-10 13:20:48.615054] > epoch 1, lrate 0.080000, validation error nan (%) [2015-11-10 13:20:48.619409] > epoch 2, training error nan (%) [2015-11-10 13:20:48.619491] > epoch 2, lrate 0.080000, validation error nan (%) [2015-11-10 13:20:48.622980] > epoch 3, training error nan (%) [2015-11-10 13:20:48.623059] > epoch 3, lrate 0.080000, validation error nan (%) [2015-11-10 13:20:48.626443] > epoch 4, training error nan (%)
and nothing change forever... Actually, I got this behavior using a lot of different datasets, but I reproduced it here with this simple example for clarity. Any idea about the problem? I got this problem on MacOSX 10.10, python 2.7.10 and on Linux SMP Debian 3.16.7, python 2.7.9, thus it should not be dependent on local python installations. Any help is more than welcome. Thanks! Fabio