Gradient Check Failed On Ubuntu 14.04 and OSX Mavericks

purpleladydragons commented 9 years ago

I'm not sure that this issue is immediately related to the code, but I figured this was the best form of communication I have for my problem. I have gcc and g++ at version 4.4 and building the executables works fine. However, if I run a gradient check, it fails on the first weight from hidden_1_0 to output. Any ideas what might be causing this issue?

szcom commented 9 years ago

Is it an option for you to try gcc 4.8?

purpleladydragons commented 9 years ago

Yes, I just switched to 4.8.4. I'm still having the same issue. I should also mention that I've brought up this issue because I believe it is related to the main problem I'm having: when training rnnlib, the gradient decreases until about epoch 5, and then the error blows up to large, 20-digit values. And despite these values clearly being larger than previous errors, the old best_loss file is overwritten with these new poorly-performing network weights.

szcom commented 9 years ago

How cmd line and output looks? Try to reduce number of hidden layers in config file and their size.

Also you may want to try running with mkl instead of openblas.

On Tuesday, October 27, 2015, purpleladydragons notifications@github.com wrote:

Yes, I just switched to 4.8.4. I'm still having the same issue.

— Reply to this email directly or view it on GitHub https://github.com/szcom/rnnlib/issues/4#issuecomment-151586574.

purpleladydragons commented 9 years ago

Here's the cmd line and output for gradient_check: $ gradient_check check_synth1d.config loading sequences from 0 to 1 task = prediction

network: task = prediction

13MultilayerNet

5 layers: 10InputLayer "input" 1D (+) size 3 (R) 11Lstm1dLayerI4TanhS0_8LogisticE "hidden_0_0" 1D (+) inputSize 8 outputSize 2 source "input" 6 peeps (R) 11Lstm1dLayerI4TanhS0_8LogisticE "hidden_1_0" 1D (+) inputSize 8 outputSize 2 source "hidden_0_0" 6 peeps (R) 11Lstm1dLayerI4TanhS0_8LogisticE "hidden_2_0" 1D (+) inputSize 8 outputSize 2 source "hidden_1_0" 6 peeps

18MixtureOutputLayer "output" 1D (+) size 13 source "hidden_2_0"

20 connections: "bias_to_hidden_0_0" (8 wts) "hidden_0_0_to_hidden_0_0delay-1" (16 wts) "input_to_hidden_0_0" (24 wts) "charwindow_0_to_hidden_0_0delay-1" (464 wts) "bias_to_hidden_1_0" (8 wts) "hidden_1_0_to_hidden_1_0delay-1" (16 wts) "hidden_0_0_to_hidden_1_0" (16 wts) "input_to_hidden_1_0" (24 wts) "charwindow_0_to_hidden_1_0" (464 wts) "bias_to_output" (13 wts) "hidden_2_0_to_output" (26 wts) "hidden_0_0_to_output" (26 wts) "hidden_1_0_to_output" (26 wts) "bias_to_hidden_2_0" (8 wts) "hidden_2_0_to_hidden_2_0delay-1" (16 wts) "hidden_1_0_to_hidden_2_0" (16 wts) "input_to_hidden_2_0" (24 wts) "charwindow_0_to_hidden_2_0" (464 wts) "bias_to_charwindow_0" (6 wts)

"hidden_0_0_to_charwindow_0" (12 wts)

bidirectional = false symmetry = false 1695 weights

setting random seed to 10

1695 uninitialised weights randomised uniformly in [-0.1,0.1] data header: numDims = 1 inputSize = 3 numSequences = 10748 numTimesteps = 6765601

tag = lineStrokes/a01/a01-000/a01-000u-01.xml input shape = (568 3) timesteps = 568 seq chars:A MOVE to stop Mr . Gaitskell

target shape = (568 3)

perturbation = 1e-05 sigFigs = 6 verbose = true

breakOnError = true

calculating algorithmic pds

checking against numeric pds

checking layer output checking connection bias_to_output weight 0 numeric deriv 182.021 algorithmic deriv 182.021 weight 1 numeric deriv -182.021 algorithmic deriv -182.021 weight 2 numeric deriv -216034 algorithmic deriv -216034 weight 3 numeric deriv -1.56788e+06 algorithmic deriv -1.56788e+06 weight 4 numeric deriv -85377.9 algorithmic deriv -85377.9 weight 5 numeric deriv -1.25768e+06 algorithmic deriv -1.25768e+06 weight 6 numeric deriv -872.299 algorithmic deriv -872.299 weight 7 numeric deriv -4741.78 algorithmic deriv -4741.78 weight 8 numeric deriv 1077.72 algorithmic deriv 1077.72 weight 9 numeric deriv -1083.78 algorithmic deriv -1083.78 weight 10 numeric deriv 14575.8 algorithmic deriv 14575.8 weight 11 numeric deriv 555550 algorithmic deriv 555550 weight 12 numeric deriv -221.044 algorithmic deriv -221.044 checking connection hidden_2_0_to_output weight 0 numeric deriv 14.7127 algorithmic deriv 14.7127 weight 1 numeric deriv -22.4983 algorithmic deriv -22.4982 weight 2 numeric deriv -14.7127 algorithmic deriv -14.7127 weight 3 numeric deriv 22.4983 algorithmic deriv 22.4982 weight 4 numeric deriv 50115.3 algorithmic deriv 50115.3 weight 5 numeric deriv 32015 algorithmic deriv 32015 weight 6 numeric deriv 359904 algorithmic deriv 359904 weight 7 numeric deriv -24359.2 algorithmic deriv -24359.2 weight 8 numeric deriv 4601.98 algorithmic deriv 4601.98 weight 9 numeric deriv 48352.6 algorithmic deriv 48352.6 weight 10 numeric deriv -10265 algorithmic deriv -10265 weight 11 numeric deriv 111390 algorithmic deriv 111390 weight 12 numeric deriv 152.42 algorithmic deriv 152.42 weight 13 numeric deriv 97.2105 algorithmic deriv 97.2104 weight 14 numeric deriv -635.587 algorithmic deriv -635.587 weight 15 numeric deriv 502.196 algorithmic deriv 502.196 weight 16 numeric deriv 30.185 algorithmic deriv 30.1851 weight 17 numeric deriv -1135.14 algorithmic deriv -1135.14 weight 18 numeric deriv -1359.7 algorithmic deriv -1359.7 weight 19 numeric deriv -1528.88 algorithmic deriv -1528.88 weight 20 numeric deriv -2952.01 algorithmic deriv -2952.01 weight 21 numeric deriv -5630.71 algorithmic deriv -5630.71 weight 22 numeric deriv -132439 algorithmic deriv -132439 weight 23 numeric deriv -23571.8 algorithmic deriv -23571.8 weight 24 numeric deriv -11.1746 algorithmic deriv -11.1747 weight 25 numeric deriv 46.9296 algorithmic deriv 46.9296 checking connection hidden_0_0_to_output weight 0 numeric deriv -55.8009 algorithmic deriv -55.8009 weight 1 numeric deriv -52.3936 algorithmic deriv -52.3936 weight 2 numeric deriv 55.8009 algorithmic deriv 55.8009 weight 3 numeric deriv 52.3936 algorithmic deriv 52.3936 weight 4 numeric deriv -7178.37 algorithmic deriv -7178.37 weight 5 numeric deriv 73897.6 algorithmic deriv 73897.6 weight 6 numeric deriv 199076 algorithmic deriv 199076 weight 7 numeric deriv 410776 algorithmic deriv 410776 weight 8 numeric deriv 6842.67 algorithmic deriv 6842.67 weight 9 numeric deriv -233.717 algorithmic deriv -233.717 weight 10 numeric deriv 295483 algorithmic deriv 295483 weight 11 numeric deriv 272380 algorithmic deriv 272380 weight 12 numeric deriv 144.124 algorithmic deriv 144.125 weight 13 numeric deriv 335.492 algorithmic deriv 335.492 weight 14 numeric deriv 1204.41 algorithmic deriv 1204.41 weight 15 numeric deriv 392.369 algorithmic deriv 392.369 weight 16 numeric deriv 373.653 algorithmic deriv 373.653 weight 17 numeric deriv 337.81 algorithmic deriv 337.81 weight 18 numeric deriv 3304.99 algorithmic deriv 3304.99 weight 19 numeric deriv 1933.8 algorithmic deriv 1933.8 weight 20 numeric deriv 5199.51 algorithmic deriv 5199.51 weight 21 numeric deriv -1875.2 algorithmic deriv -1875.2 weight 22 numeric deriv -35710 algorithmic deriv -35710 weight 23 numeric deriv -106962 algorithmic deriv -106962 weight 24 numeric deriv 63.1939 algorithmic deriv 63.194 weight 25 numeric deriv 49.5532 algorithmic deriv 49.5532 checking connection hidden_1_0_to_output weight 0 numeric deriv 5.88499 algorithmic deriv 5.88502

GRADIENT CHECK FAILED!

check_synth1d.config looks like this: task prediction hiddenType lstm1d,lstm1d,lstm1d trainFile online.nc dataFraction 1 maxTestsNoBest 20 hiddenSize 2,2,2 learnRate 1e-4 momentum 0.9 optimiser rmsprop verbose true predictionSteps 10 bidirectional false autosave false gradCheck true mixtures 2 randSeed 10 charWindowSize 2

purpleladydragons commented 9 years ago

Changing the config file for gradient_check so there is only one hidden layer with one lstm cell does not work. The gradient check still fails.

szcom commented 9 years ago

The gradients you have are way too high. Cost function goes crazy at some point. The first checked weight gives gradient of 182 in your case and I got 2.47. I suggest you step through the sequence and see where it blows up either with gdb or by printing current loss value from MixtureOutputLayer::calculate_errors()

tag = lineStrokes/a01/a01-000/a01-000u-01.xml input shape = (568 3) timesteps = 568 seq chars:A MOVE to stop Mr . Gaitskell

target shape = (568 3)

perturbation = 1e-05 sigFigs = 6 verbose = true

breakOnError = true

calculating algorithmic pds

checking against numeric pds

checking layer output checking connection bias_to_output weight 0 numeric deriv 2.47947 algorithmic deriv 2.47947 weight 1 numeric deriv -2.47947 algorithmic deriv -2.47947 weight 2 numeric deriv 153.611 algorithmic deriv 153.611 weight 3 numeric deriv -601.78 algorithmic deriv -601.78 weight 4 numeric deriv 50.539 algorithmic deriv 50.539

purpleladydragons commented 9 years ago

Okay thank you, I will try debugging it. Why do you think it is so different on my computer? Is it likely related to the installed libraries?

purpleladydragons commented 9 years ago

I am also having the same issue on my macbook as well if that helps at all.

robsaundersx commented 8 years ago

I've run into the same issue as purpleladydragons above when trying to compile rnnlib in a VirtualBox running Ubuntu 15.10 (64-bit) on a host 15-inch MacBook Pro (2015) running El Capitan. When running gradient_check check_synth1d.config it fails in precisely the same way as described above, i.e., with the final lines of the output reading

checking connection hidden_1_0_to_output
weight 0 numeric deriv 5.88499 algorithmic derive 5.88502

I'm curious to understand the possible causes of this failure and if anyone can provide examples of a setup that has allowed them to successfully compile. I've tried a couple of different alternatives, e.g., compiling with different versions of gcc (4.7 and 4.8) and on a remote host running Ubuntu 14.10, but with the same result. I haven't tried mkl yet, but that will likely be my next step unless there is an easier solution, e.g., switching distribution.

TonyWangX commented 8 years ago

Hi,

I was caught by the exactly same "gradient check fail" during the first trial.

It turns out that the crazy gradient is due to the un-normalized data ! I forgot to normalize the data after generating the .nc files.

If you use build_netcdf.sh to pre-process the data, be careful about any error information printed. Possibly, the normalise_inputs.sh and normalise_targets.sh fail to work because the backbone normalise_netcdf.sh requires an additional tool called ncks. Please check that.

After normalising the data, I can get the calm gradient: checking layer output checking connection bias_to_output weight 0 numeric deriv 2.47947 algorithmic deriv 2.47947 weight 1 numeric deriv -2.47947 algorithmic deriv -2.47947

FYI, in order to check the data, a convenient way is to use the scipy.io.netcdf_file in Python.
For normalized data, you should see inputMeans in data.variables. You can also check the mean and variance quickly.

from scipy import io data = io.netcdf_file('online.nc') data.variables
Out[12]: {'inputs': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d97810>, 'inputsMeans': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d97a10>, 'inputsStds': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d97a90>, 'predSeqLengths': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d97850>, 'seqDims': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d977d0>, 'seqLengths': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d97790>, 'seqTags': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d976d0>, 'targetPatterns': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d979d0>, 'targetPatternsMeans': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d97b90>, 'targetPatternsStds': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d97c50>, 'targetSeqDims': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d978d0>, 'targetStrings': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d97710>, 'wordTargetStrings': <scipy.io.netcdf.netcdf_variable at 0x7f86f8d97750>} "Check the first dimension of input" data.variables['inputs'][:,0].mean()
Out[10]: 1.7300079e-05 "Check the second dimension of input" data.variables['inputs'][:,1].mean() Out[11]: -5.7453597e-09

Hope it helps.

szcom / rnnlib

Gradient Check Failed On Ubuntu 14.04 and OSX Mavericks #4

13MultilayerNet

18MixtureOutputLayer "output" 1D (+) size 13 source "hidden_2_0"

"hidden_0_0_to_charwindow_0" (12 wts)

target shape = (568 3)

breakOnError = true

checking against numeric pds

GRADIENT CHECK FAILED!

target shape = (568 3)

breakOnError = true

checking against numeric pds