mikegashler / waffles

A toolkit of machine learning algorithms.
http://gashler.com/mike/waffles/
86 stars 33 forks source link

uninitialized values #33

Closed mikegashler closed 7 years ago

mikegashler commented 7 years ago

It looks like there are some uninitialized values somewhere causing semi-nondeterministic behavior with the new neural net optimizer code. The NeuralDecomposition test sometimes passes and sometimes fails with a floating point exception, depending on which tests are run before it. Here's a call stack:

1 GClasses::GAssertFailed() at /home/mike/waffles/src/GClasses/GError.cpp:149

2 GClasses::GLayerClassic::feedForward() at /home/mike/waffles/src/GClasses/GLayer.cpp:496

3 GClasses::GNeuralNetLayer::feedForward() at /home/mike/waffles/src/GClasses/GLayer.h:93

4 GClasses::GNeuralNet::forwardProp() at /home/mike/waffles/src/GClasses/GNeuralNet.cpp:401

5 GClasses::GNeuralNet::predict() at /home/mike/waffles/src/GClasses/GNeuralNet.cpp:423

6 GClasses::GNeuralNetFunction::evaluate() at /home/mike/waffles/src/GClasses/GOptimizer.cpp:85

7 GClasses::GSGDOptimizer::updateDeltas() at /home/mike/waffles/src/GClasses/GOptimizer.cpp:279

8 GClasses::GDifferentiableOptimizer::optimizeIncremental() at /home/mike/waffles/src/GClasses/GOptimizer.cpp:146

9 GClasses::GNeuralDecomposition::trainIncremental() at /home/mike/waffles/src/GClasses/GNeuralDecomposition.cpp:303

10 GClasses::GNeuralDecomposition::trainInner() at /home/mike/waffles/src/GClasses/GNeuralDecomposition.cpp:196

11 GClasses::GSupervisedLearner::train() at /home/mike/waffles/src/GClasses/GLearner.cpp:478

12 GClasses::GNeuralDecomposition::trainOnSeries() at /home/mike/waffles/src/GClasses/GNeuralDecomposition.cpp:68

13 GClasses::GNeuralDecomposition::test() at /home/mike/waffles/src/GClasses/GNeuralDecomposition.cpp:335

14 GTestHarness::runTest() at /home/mike/waffles/src/test/main.cpp:863

15 GTestHarness::runAllTests() at /home/mike/waffles/src/test/main.cpp:950

16 main() at /home/mike/waffles/src/test/main.cpp:1046

And when I run this test in Valgrind, it reports lots of issues like this:

==10133== Conditional jump or move depends on uninitialised value(s) ==10133== at 0x4C4AB1: GClasses::GLayerClassic::feedForward(GClasses::GVec const&) (GLayer.cpp:482) ==10133== by 0x4CA9AE: GClasses::GLayerMixed::feedForward(GClasses::GVec const&) (GLayer.cpp:1419) ==10133== by 0x5465B1: GClasses::GNeuralNet::forwardProp(GClasses::GVec const&, unsigned long) (GNeuralNet.cpp:396) ==10133== by 0x5467E5: GClasses::GNeuralNet::predict(GClasses::GVec const&, GClasses::GVec&) (GNeuralNet.cpp:423) ==10133== by 0x551B21: GClasses::GNeuralNetFunction::evaluate(GClasses::GVec const&, GClasses::GVec&) (GOptimizer.cpp:85) ==10133== by 0x55305F: GClasses::GSGDOptimizer::updateDeltas(GClasses::GVec const&, GClasses::GVec const&) (GOptimizer.cpp:279) ==10133== by 0x5521B8: GClasses::GDifferentiableOptimizer::optimizeIncremental(GClasses::GVec const&, GClasses::GVec const&) (GOptimizer.cpp:146) ==10133== by 0x543560: GClasses::GNeuralDecomposition::trainIncremental(GClasses::GVec const&, GClasses::GVec const&) (GNeuralDecomposition.cpp:303) ==10133== by 0x5429E0: GClasses::GNeuralDecomposition::trainInner(GClasses::GMatrix const&, GClasses::GMatrix const&) (GNeuralDecomposition.cpp:196) ==10133== by 0x4D76D9: GClasses::GSupervisedLearner::train(GClasses::GMatrix const&, GClasses::GMatrix const&) (GLearner.cpp:478) ==10133== by 0x541C7C: GClasses::GNeuralDecomposition::trainOnSeries(GClasses::GMatrix const&) (GNeuralDecomposition.cpp:68) ==10133== by 0x5438D9: GClasses::GNeuralDecomposition::test() (GNeuralDecomposition.cpp:335)

StephenAshmore commented 7 years ago

Yep, definitely reproducible. I'm working on this, but I'm not sure if I will be able to figure it out solo. So far, it seems that when running it with the other tests the net value of the output layer continually gets bigger and bigger until it goes above the bounds of our assert. I cannot figure out the uninitialized values though.

StephenAshmore commented 7 years ago

Here is a bit more data. I tested trying the neural decomp test versus individual other tests. I didn't test them all after discovering the following. Here are tests that when ran immediately before neural decomposition's test would cause ND to fail: GNaiveInstance, GNaiveBayes, GLinearDistribution, GKNN, GGaussianProcess, GDecisionTree, and GCoordVectorIterator. Now, I made a small test change to the GCoordVectorIterator test, and basically made it just pass immediately. Re-running that made GNeuralDecomp's test pass. When I removed that change however, GNeuralDecomp still passed. GDecisionTree still "causes" GNeuralDecomp to fail. There seems to be an extra level of non-determinism to this due to some tests that once caused ND to fail won't always cause it to fail.

StephenAshmore commented 7 years ago

Also, it does not crash on the first pass/epoch. The net value of the output layer slowly increases and will fail around epoch 220 or 330. What is extra strange is that the test by itself will pass; but valgrind has many, many problems with uninitialized values.

thelukester92 commented 7 years ago

In my mind it seems like it is likely GLayerMixed. I'll let you know if I think of anything.

mikegashler commented 7 years ago

It looks like the uninitialized variable is a matrix because the valgrind errors go away when I insert "setAll(0.0)" in GMatrix::resize. Still don't know which one, though.

StephenAshmore commented 7 years ago

It may be a good idea to have the matrix class have default values. It's been discussed in the past, and I'm not sure how much performance would be lost if we did that.

mikegashler commented 7 years ago

Bjarne Stroustrup (the father of C++) sez you should never have to pay for what you don't use (among other mantras that have now become cliche). His opinion about what constitutes good code doesn't have to matter to us, but we should find some way to be consistent, and he seems to be the standard for C++ libraries. I have nothing against adding a "resize_and_fill" method, but I think it would bother me to know my new matrix had just been filled with zeros, or identity, and then fill it again with small random values, or whatever initialization I happened to really need at the time.

On 10/31/2016 08:10 PM, Stephen Ashmore wrote:

It may be a good idea to have the matrix class have default values. It's been discussed in the past, and I'm not sure how much performance would be lost if we did that.

On Oct 31, 2016 7:34 PM, mikegashler notifications@github.com wrote:

It looks like the uninitialized variable is a matrix because the valgrind errors go away when I insert "setAll(0.0)" in GMatrix::resize. Still don't know which one, though.

thelukester92 commented 7 years ago

I think I am leaning toward leaving the Matrix value uninitialized. However, it is worth considering that std container resize (i.e. std::vector) and related constructors have a second parameter to indicate the default value.

http://www.cplusplus.com/reference/vector/vector/resize/

mikegashler commented 7 years ago

I narrowed it down. The problem goes away if you add m_out.fill(0.0); in GLayerClassic::resize. Does that mean neural decomposition begins training without initializing one of its layers? Could we add a check somewhere, at least in debug mode, to catch this issue?

mikegashler commented 7 years ago

What if we have two resize methods, like

void resize(size_t rows, size_t cols, double initial_value); void resize_without_initializing(size_t rows, size_t cols);

The simple name of the first one encourages users to call the version that initializes the values, but the slightly faster version is still an option, so no one is forced to do superfluous initialization.

thelukester92 commented 7 years ago

I dropped in this code to GNeuralDecomposition right before initializing the weights:

pSine->weights().fill(999.0);
pLinear->weights().fill(999.0);
pSoftplus->weights().fill(999.0);
pSigmoid->weights().fill(999.0);
pOutput->weights().fill(999.0);

When I inspect the weights after they are initialized but before training, they look fine -- none of the weights are 999.

thelukester92 commented 7 years ago

I dropped in this code to GNeuralDecomposition right after initializing the weights:

pSine->weights().fill(0.0);
pLinear->weights().fill(0.0);
pSoftplus->weights().fill(0.0);
pSigmoid->weights().fill(0.0);
pOutput->weights().fill(0.0);

And it still gets the weights to blow up.

StephenAshmore commented 7 years ago

I have fixed what we suspected was the issue: GMixedLayers were never giving the appropriate error to its components. I'm now passing that error to the appropriate components. I've pushed that change into waffles, but now I'm getting a floating point exception. I've got to head away from my computer for an hour or so so I won't be able to just instantly fix it, so I thought one of you might like to take a look. Here's the stack trace:

GNeuralDecomposition
Program received signal SIGFPE, Arithmetic exception.
0x00007ffff7593f14 in __ieee754_exp_avx (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/e_exp.c:227
227     ../sysdeps/ieee754/dbl-64/e_exp.c: No such file or directory.
(gdb) bt
0x00007ffff7593f14 in __ieee754_exp_avx (x=<optimized out>) at ../sysdeps/ieee754/dbl-64/e_exp.c:227
#1  0x00007ffff75505cf in __GI___exp (x=384068.46669027326) at ../sysdeps/ieee754/dbl-64/w_exp.c:26
#2  0x0000000000417155 in GClasses::GActivationSoftPlus::derivative (this=0x8d8bf0, x=-384068.46669027326, index=0) at GActivation.h:724
#3  0x0000000000415794 in GClasses::GActivationFunction::derivativeOfNet (this=0x8d8bf0, net=-384068.46669027326, activation=0, index=0) at GActivation.h:64
#4  0x00000000004c5170 in GClasses::GLayerClassic::deactivateError (this=0x8d8fb0) at GLayer.cpp:542
#5  0x00000000004c52f6 in GClasses::GLayerClassic::updateDeltas (this=0x8d8fb0, upStreamActivation=..., deltas=...) at GLayer.cpp:560
#6  0x00000000004caf96 in GClasses::GLayerMixed::updateDeltas (this=0x8d8140, upStreamActivation=..., deltas=...) at GLayer.cpp:1490
#7  0x0000000000551e9b in GClasses::GNeuralNetFunction::updateDeltas (this=0x8d96a0, x=..., blame=..., deltas=...) at GOptimizer.cpp:98
#8  0x0000000000553323 in GClasses::GSGDOptimizer::updateDeltas (this=0x7fffffffd8e8, feat=..., lab=...) at GOptimizer.cpp:282
#9  0x0000000000552419 in GClasses::GDifferentiableOptimizer::optimizeIncremental (this=0x7fffffffd8e8, feat=..., lab=...) at GOptimizer.cpp:147
#10 0x000000000054377b in GClasses::GNeuralDecomposition::trainIncremental (this=0x7fffffffd8b0, pIn=..., pOut=...) at GNeuralDecomposition.cpp:304
#11 0x0000000000542bfb in GClasses::GNeuralDecomposition::trainInner (this=0x7fffffffd8b0, features=..., labels=...) at GNeuralDecomposition.cpp:197
#12 0x00000000004d78f4 in GClasses::GSupervisedLearner::train (this=0x7fffffffd8b0, features=..., labels=...) at GLearner.cpp:478
#13 0x0000000000541e97 in GClasses::GNeuralDecomposition::trainOnSeries (this=0x7fffffffd8b0, series=...) at GNeuralDecomposition.cpp:68
StephenAshmore commented 7 years ago

Here's a bit of an update. I've removed the SoftPlus component from neural decomposition to see if that was the problem. It was not the problem. When softplus is removed, what happens is the error in the output layer ( which is a GLayerClassic ) gets very large. I'm looking at something like e+08 after two train incrementals. It does not seem like that error is getting propagated back into the Mixed layer. If you output the error of the mixed layer and the error of classic layers, the mixed layer ( and its components ) will have an error of zero. I double-checked our backPropError method in GLayerClassic, and it does get called and it is computing error terms for an upstream layer (the mixed layer).

However, our delta terms in the mixed layer are zero.

mikegashler commented 7 years ago

I did enough debugging to convince myself that this issue is caused by reading beyond buffers due to GLayerMixed being broken. The solution is to redesign GLayerMixed. So, as a first step toward this, I ripped out GLayerMixed and disabled the neural decomposition test. (Neural decomposition is the only place GLayerMixed was really used.) So, now all tests pass, and when we come up with a replacement for GLayerMixed then we can resurrect neural decomposition.