Unknown Adadelta trainer error/issues

merceyz commented 8 years ago

Hello,

I was looking at the different trainers and reading some documents on them when i noticed a value called "epsilon". This value is nowhere to be seen in the API documentation and thus i assume it's missing. (Unless it's the "anneal" option which would be awkward for me)

merceyz commented 8 years ago

Normally though, I'd expect the test accuracy to be at lesat as high as train accuracy, on the whole, if the data is the same, since the weights used for test will be the most recent weights, should be the best.

Data is identical

Build 8.5.1 - AdaDelta, doesn't get much better than this: a8c402ee86aeaf549b687349c3accade

Build 11.3.0alpha1 - AdaDelta, doesn't get much better than this: 2fb7c44a6afebc3b4e7940bb19210d0d

I don't have the same training set as i had before so i can't really tell if the original issue is solved or not.

hughperkins commented 8 years ago

Seems like 11.3.0alpha1 is performing better?

merceyz commented 8 years ago

It does, however it doesn't improve much, if any after hitting that spot.

AdaGrad however beat them both at epoch 4 and continued to improve 759ca8e45161d0e8ae6ecbc5c7cff6d8

hughperkins commented 8 years ago

Ok. I'm not up to speed on what the research says about adagrad vs adadelta. Is it possible that adagrad works better for your geometry? Or you suspect there might be an additional bug lurking somewhere?

merceyz commented 8 years ago

On the mnist dataset adadelta beats adagrad. Source: https://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html

Is it possible that adagrad works better for your geometry?

That might be the case.

Or you suspect there might be an additional bug lurking somewhere?

For now i have no idea, if i find something i'll let you know.

hughperkins commented 8 years ago

k, sounds good :-) I'll push out current version as a release, starting the build in 3 minutes.

merceyz commented 8 years ago

Perhaps fix the space in path issue first? ^^

hughperkins commented 8 years ago

Yeah, I should, but I'm fixing something else thats broken, in a different project...

merceyz commented 8 years ago

Alright.

Fyi: now that you changed the jpeg library i was able to build it with jpeg support. So whenever we do any testing in the future you don't have to build it for me

hughperkins commented 8 years ago

Fyi: now that you changed the jpeg library i was able to build it with jpeg support. So whenever we do any testing in the future you don't have to build it for me

Awesome!

merceyz commented 8 years ago

I fixed the loader but got a error I've never seen before

29bae446af6a1bb456546e89057cc9af

It happened because i changed GenericLoaderv2 trainLoader(config.dataDir + "/" + config.trainFile); to GenericLoaderv2 trainLoader(config.dataDir + "\\" + config.trainFile);

hughperkins commented 8 years ago

I'm not sure why you get that error. But generally, the way that paths work, on the whole is:

throughout the program they're apprxoimately always /, or at least / is acceptable everywhere
at the point that they're passed through to a file open command, they're converted to \, if on windows, using localizepath, eg https://github.com/hughperkins/DeepCL/blob/master/src/util/JpegHelper.cpp#L49

hughperkins commented 8 years ago

I guess the issue might not be caused by your change, but maybe your change fixed one bug, and now a new bug appears, that was present before, but the program already crashed earlier before perhaps?

hughperkins commented 8 years ago

I've no idea what the error means by the way. It looks like it's something to do with memory allocaiton. Maybe osmehow the allocation is too big, or zero, or negative. Actually, it seems to be an alignment-issue. If it was me, I'd probably try to get more information on where the bug is happening. Eg, by clicking 'debug', or sprinkling cout statements throughout the program.

merceyz commented 8 years ago

I made a few changes but only that one crashed it so I just reverted that one change and it loads the images fine and runs without issues.

hughperkins commented 8 years ago

Ok. Thats odd... good that it works now though :-)

hughperkins commented 8 years ago

(Oh... it could be my code kind of assumes / throughout, is possible).

merceyz commented 8 years ago

To fix it not loading files where the path had a space in it I made it separate path from label using the vertical bar, as it will never be in a valid file path.

E:\Data folder\0-Unknown\05d2dd67-3369-4c9c-abd3-29a0c0f83f15.jpeg|0

Line 92 in ManifestLoaderv1.cpp

vector splitLine = split(line, "|");

I felt creating a pull request just for that was too much and you might not even want to change the manifest format. That's up to you.

hughperkins commented 8 years ago

Oh, right, that would fix the space-issue. I suppose on the whole I'd prefer to maintain backwards-compatibility, where possible. The clean way would probably be to implement ", but that sounds like a bunch of work. You can hackily strip off the final space and number from the line by doing something like:

                    vector<string> splitLine = split(manifestLine, " ");
         string classString = splitLine[splitLine.size() - 1];
        string filePath = manifestLine.substr(0, manifestLine.size() - classString.size() - 1);

It's not ideal (and note the last line needs some debugging), but it is at least backwards-compatible. It's possible you're the only person using the manifest loader, but I have no way of knowing who's using what really. I generally find out people are using stuff when it breaks :-D

merceyz commented 8 years ago

I made a pull request that fixes the issue and is backwards compatible

merceyz commented 7 years ago

A long overdue update on this.

Turns out the issue wasn't with Adadelta (though it was missing the epsilon) but rather the SoftMaxLayer when calculating the loss. There is no validation/error handling during the log calculation so the loss becomes NaN/+inf/-inf It happened more frequently when using the ReLu activation layer, most likely because the output more often is 0 compared to a tanh layer

I'm not sure how to handle this so i'm hoping you can take a look at it

See error handling at http://en.cppreference.com/w/cpp/numeric/math/log

Locations in the SoftMaxLayer where there needs to be some sort of error handling; https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L76 https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L87 https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L103 https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L117

hughperkins commented 7 years ago

Ah, well spotted :-)

Probably it should be handled by first finding the max, then subtracting that. I"m not sure if it's exactly that, but conceptually I expect it to be something simlar.

See how this works for 'forward':

https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L282-L294

            float maxValue = input[imageOffset + 0]; // since we assume imagesize 1, this is correct
            for(int plane = 1; plane < numPlanes; plane++) {
                maxValue = std::max(maxValue, input[imageOffset + plane]);
            }
            // calculate sum, under this max
            float denominator = 0;
            for(int plane = 0; plane < numPlanes; plane++) {
                denominator += exp(input[imageOffset + plane] - maxValue);
            }
            // now calc the softmaxes:
            for(int plane = 0; plane < numPlanes; plane++) {
                output[imageOffset + plane] = exp(input[imageOffset + plane] - maxValue) / denominator;
            }

hughperkins commented 7 years ago

(more information here: https://stackoverflow.com/questions/34968722/softmax-function-python But basically:

subtracting a number from the top and bottom exp, in a fraction of exp, doesnt change the fraction:
```
exp(a) / exp(b) == exp(a - something)  / exp(b - something)
```
This is just how it is, mathematically.

Now, in our case, some of the exps are going to infinity, which causes problems. However, if we take the largest of these numbers, and subtract it from all of them, the result will be the same, but without nans.

)

hughperkins commented 7 years ago

Hmmm... actually.. not sure if that is the fix here. Preference to look at a single method for now. Can you choose one of the 4 methods you pointed me at, and provide an example of the value of output that is causing the nan?

hughperkins commented 7 years ago

(if the examples you find are eg -1e-19, some very tiny, but negative, numbers, then the solution might be to add some small epsilon, eg 1e-6, like:

loss += - log(output[ imageOffset + label ] + 1e-6);

hughperkins commented 7 years ago

(since log(0) == infinity, and log(-something) == nan:

$ wcalc "log(0)"
 = Infinity
$ wcalc "log(-0.0000001)"
 = Not a Number

merceyz commented 7 years ago

log(output[ imageOffset + label ] + 1e-6);

I tried log(max(output[ imageOffset + label ], 1e-6)) yesterday which instead of NaN just made the loss climb more and more

Data you requested:

Netdef using Adadelta: 16c3z-elu-mp2-32c3z-elu-mp2-64c3z-elu-mp2-128c3z-elu-mp2-100n-elu-100n-elu-2n

output[imageOffset + label] is -nan at https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L87

hughperkins commented 7 years ago

output is nan? Interesting. Can you find the values of the input to the softmax layer which are giving nan output?

On 24 June 2017 16:20:13 BST, Chris notifications@github.com wrote:

log(output[ imageOffset + label ] + 1e-6);

I tried log(max(output[ imageOffset + label ], 1e-6)) which instead of NaN just made the loss climb more and more

Data you requested:

Netdef using Adadelta: 16c3z-elu-mp2-32c3z-elu-mp2-64c3z-elu-mp2-128c3z-elu-mp2-100n-elu-100n-elu-2n

output[imageOffset + label] is -nan at https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L87

-- You are receiving this because you were assigned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/DeepCL/issues/87#issuecomment-310844686

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

merceyz commented 7 years ago

input[imageOffset + plane] is -nan at https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L293

hughperkins commented 7 years ago

ok, basically, what we need to know is, at which layer is there some data going in that is not nan, and that the data leaving that layer is nan. (Once its nan, it'll stay nan forever. nans contaminate anything they touch, sort of like Ice Nine )

merceyz commented 7 years ago

Is there something in deepcl to dump all data on every layer to a file?

hughperkins commented 7 years ago

No. You'll need to hack the code a bit to do this. It's a bit time-consuming. You'll need to copy the data from the gpu onto the cpu, each time, in order to print it. You'll probably want to have some code that checks the output from each layer for nan, and dumps the input and output to a file, and stops the program, if/when it sees any.

merceyz commented 7 years ago

For both forward and backward or only forward?

hughperkins commented 7 years ago

Choose one direction, hope you get lucky. If you don't get lucky, you'll need to add the other direction.

I would start with the forward direction.

You can probably start with just the forward direction, on just the softmax layer. Then add the backward direction, also on softmax layer only, if that doesnt localize the problem. If still not localized, you'll need to add the debugging code to other layers too :-(

merceyz commented 7 years ago

During forward the first instance of -nan is on layer 2 untitled

Some of the inputs: 0.289124 0.328768 0.360483 0.304982 0.273267 0.37634 0.400126 0.265338 0.304982 0.304982 0.352554 Almost all outputs are -nan

I added the following code after this line https://github.com/hughperkins/DeepCL/blob/master/src/net/NeuralNet.cpp#L191

float * result = layers[layerId]->getOutput();
int items = layers[layerId]->getOutputNumElements();
for (int i = 0; i < items; i++)
{
    if (std::isfinite(result[i]) == false)
    {
        cout << "Found error at layer " << layerId << " " << layers[layerId]->asString();

        ofstream dumpData;
        dumpData.open("Datadump.txt");

        float* input = layers[layerId - 1]->getOutput();
        int inputItems = layers[layerId - 1]->getOutputNumElements();

        dumpData << "Input data of layer " << layers[layerId]->asString() << endl;
        for (int inputIndex = 0; inputIndex < inputItems; inputIndex++)
        {
            dumpData << input[inputIndex] << "  ";
        }

        dumpData << endl << "Output data of layer " << layers[layerId]->asString() << endl;
        for (int outputIndex = 0; outputIndex < items; outputIndex++)
        {
            dumpData << result[outputIndex] << "  ";
        }
        dumpData.close();

        throw runtime_error("Non-finite number found");
    }
}

hughperkins commented 7 years ago

You need to find the first layer for which:

all of the inputs are not nan
at least one output is nan

(One nan will then spread to the other outputs, in the next layer, like a disease...)

On 24 June 2017 19:00:31 BST, Chris notifications@github.com wrote:

During forward the first instance of -nan is on layer 2

Some of the inputs: 0.289124 0.328768 0.360483 0.304982 0.273267 0.37634 0.400126 0.265338 0.304982 0.304982 0.352554 Almost all outputs are -nan

I added the following code after this line https://github.com/hughperkins/DeepCL/blob/master/src/net/NeuralNet.cpp#L191
float * result = layers[layerId]->getOutput();
int items = layers[layerId]->getOutputNumElements();
for (int i = 0; i < items; i++)
{
   if (std::isfinite(result[i]) == false)
   {
cout << "Found error at layer " << layerId << " " <<
layers[layerId]->asString();

       ofstream dumpData;
       dumpData.open("Datadump.txt");

       float* input = layers[layerId - 1]->getOutput();
       int inputItems = layers[layerId - 1]->getOutputNumElements();

dumpData << "Input data of layer " << layers[layerId]->asString() <<
endl;
       for (int inputIndex = 0; inputIndex < inputItems; inputIndex++)
       {
           dumpData << input[inputIndex] << "  ";
       }

dumpData << endl << "Output data of layer " <<
layers[layerId]->asString() << endl;
       for (int outputIndex = 0; outputIndex < items; outputIndex++)
       {
           dumpData << result[outputIndex] << "  ";
       }
       dumpData.close();

       throw runtime_error("Non-finite number found");
   }
}
-- You are receiving this because you were assigned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/DeepCL/issues/87#issuecomment-310854112

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

merceyz commented 7 years ago

Since the first instance of nan during forward is at the first conv layer I assume it happens during backwards.

What is the input and output for any given layer for backwards?

On forwards I got it like this: Input: layers[layerId - 1]->getOutput() Output: layers[layerId]->getOutput()

Is it reversed for backwards?

hughperkins commented 7 years ago

in the backwards direction:

the data arriving, will arrive via gradOutput
the result of the layer, will go out via gradInput

Forwards:

input => output

Backwards:

gradInput <= gradOutput

But actually, there are two backwards calculations:

Backward: calculate gradInput = backward(input, gradOutput, weights), https://github.com/hughperkins/DeepCL/blob/master/src/conv/Backward.h#L28
backpropWeights: gradWeights = backpropWeights(input, gradOutput), https://github.com/hughperkins/DeepCL/blob/master/src/conv/BackpropWeights.h#L27

... so you might need to also check if the nans are appearing for the first time in the weights, during backpropWeights.

merceyz commented 7 years ago

I'm uncertain about what the input/output and size is during backwards. Could you take a look?

GradInput:

Result: layer->getGradInput();
Size: layer->previousLayer->getOutputNumElements();

Input: layer->nextLayer->getGradInput();
Size: layer->getOutputNumElements();

GradWeights:

Result: layer->getGradWeights();
Size: layer->getWeightsSize();

Input: layer->nextLayer->getGradInput()
Size: layer->getOutputNumElements();

hughperkins commented 7 years ago

Ok, so:

gradInput size == input size, and
gradOutput size == output size
gradWeights szie == weights size

hughperkins commented 7 years ago

well... layer->nexgtLayer->getGradInput() actually returns layer->gradOutput, so it will have the same size as layer->output.

For two layers, layer1, and layer2, we have, going forwards:

layer1->input
layer1->output == layer2->input
layer2->output

Then, backwards we have:

layer2->gradOutput (same size as layer2->output)
layer2->gradInput == layer1->gradOutput . (same size as layer2->input, and layer1->output)
layer1->gradInput . (same size as layer1->input)

merceyz commented 7 years ago

Does this look correct? https://github.com/hughperkins/DeepCL/blob/master/src/net/NeuralNet.cpp#L226

#pragma region Check grad input for loss layer
{
    float* result = lossLayer->getGradInput();
    int resultCount = lossLayer->previousLayer->getOutputNumElements();

    for (int resultIndex = 0; resultIndex < resultCount; resultIndex++)
    {
        if (std::isfinite(result[resultIndex]) == false)
        {
            cout << "Found error at LossLayer" << lossLayer->asString() << endl;

            ofstream dumpData;
            dumpData.open("CalcGradInputDumpLossLayer.txt");

            float* input = lossLayer->getOutput();
            int inputItems = lossLayer->getOutputNumElements();

            dumpData << "Input data of layer " << lossLayer->asString() << endl;
            for (int inputIndex = 0; inputIndex < inputItems; inputIndex++)
            {
                dumpData << input[inputIndex] << "  ";
            }

            dumpData << endl << "Output data of layer " << lossLayer->asString() << endl;
            for (int outputIndex = 0; outputIndex < resultCount; outputIndex++)
            {
                dumpData << result[outputIndex] << "  ";
            }
            dumpData.close();

            throw runtime_error("Non-finite number found");
        }
    }
}
#pragma endregion

#pragma region Check grad input
{
    float* result = layer->getGradInput();
    int resultCount = layer->previousLayer->getOutputNumElements();

    for (int resultIndex = 0; resultIndex < resultCount; resultIndex++)
    {
        if (std::isfinite(result[resultCount]) == false)
        {
            cout << "GradInput: Found error at layer " << layerIdx << " " << layer->asString() << endl;

            ofstream dumpData;
            dumpData.open("CalcGradInputDump.txt");

            float* input = layer->nextLayer->getGradInput();
            int inputItems = layer->getOutputNumElements();

            dumpData << "Input data of layer " << layer->asString() << endl;
            for (int inputIndex = 0; inputIndex < inputItems; inputIndex++)
            {
                dumpData << input[inputIndex] << "  ";
            }

            dumpData << endl << "Output data of layer " << layer->asString() << endl;
            for (int outputIndex = 0; outputIndex < resultCount; outputIndex++)
            {
                dumpData << result[outputIndex] << "  ";
            }
            dumpData.close();

            throw runtime_error("Non-finite number found");
        }
    }
}
#pragma endregion

#pragma region Check weights
if (layer->getPersistSize() > 0)
{
    float* result = layer->getGradWeights();
    int resultCount = layer->getWeightsSize();

    for (int resultIndex = 0; resultIndex < resultCount; resultIndex++)
    {
        if (std::isfinite(result[resultCount]) == false)
        {
            cout << "Weights: Found error at layer " << layerIdx << " " << layer->asString() << endl;

            ofstream dumpData;
            dumpData.open("WeightsDump.txt");

            float* input = layer->nextLayer->getGradInput();
            int inputItems = layer->getOutputNumElements();

            dumpData << "Input data of layer " << layer->asString() << endl;
            for (int inputIndex = 0; inputIndex < inputItems; inputIndex++)
            {
                dumpData << input[inputIndex] << "  ";
            }

            dumpData << endl << "Output data of layer " << layer->asString() << endl;
            for (int outputIndex = 0; outputIndex < resultCount; outputIndex++)
            {
                dumpData << result[outputIndex] << "  ";
            }
            dumpData.close();

            throw runtime_error("Non-finite number found");
        }
    }
}
#pragma endregion

merceyz commented 7 years ago

I've tested a few times now and the first NaN is at a ELU ActivationLayer during forward Since I have it dump the input and output I looped over the input and ran the ELU function over every item. I did not get NaN

hughperkins commented 7 years ago

Looks pretty good :-) . You'll need to copy the data from the gpu buffer to the cpu-side buffer. You'll need to check the exact details, but I think something like:

https://github.com/hughperkins/DeepCL/blob/master/src/layer/Layer.h#L64

layer->getOutputWrapper()->copyToHost();

(followed possibly by a clFinish(), though I think the copyToHost() might do this implicitly... [checks] right, it waits implicitly https://github.com/hughperkins/EasyCL/blob/master/CLWrapper.cpp#L77

void CLWrapper::copyToHost() {
    if(!onDevice) {
        throw std::runtime_error("copyToHost(): not on device");
    }
    cl_event event = NULL;
    error = clEnqueueReadBuffer(*(cl->queue), devicearray, CL_TRUE, 0, getElementSize() * N, getHostArray(), 0, NULL, &event);    
    cl->checkError(error);
    cl_int err = clWaitForEvents(1, &event);
    clReleaseEvent(event);
    if (err != CL_SUCCESS) {
        throw std::runtime_error("wait for event on copytohost failed with " + easycl::toString(err) );
    }
    deviceDirty = false;
}

hughperkins commented 7 years ago

Hmmm, ELU forward does this: https://github.com/hughperkins/DeepCL/blob/master/src/activate/ActivationForwardGpuNaive.cpp#L85

    "#elif defined ELU\n"
    "    #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)\n"

I suppose if output is more than 10 or so, exp on output would give infinity (but not nan). I think it's physically impossible for exp to give nan, given non-nan input. Some examples of calling exp on different nubmers:

~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(10)"
 = 22026.5
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(100)"
 = 2.68812e+43
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(200)"
 = 7.22597e+86
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(300)"
 = 1.94243e+130
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(400)"
 = 5.22147e+173
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(500)"
 = 1.40359e+217
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(1000)"
 = Infinity
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(100000)"
 = Infinity
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(-10000)"
~= 0
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(-100000000000)"
~= 0
~/git-local/DeepCL/src/conv (master|✔) $ wcalc "exp(0)"
 = 1

... so thats odd. Do you happen to know exactly which input value is generating a nan output value, in the elu forward?

merceyz commented 7 years ago

Code that checks and dumps during forward:

if (layers[layerId]->hasOutputWrapper() && layers[layerId]->getOutputWrapper()->isOnDevice() == true)
{
    #pragma region Find NaN
    layers[layerId]->getOutputWrapper()->copyToHost();
    float * result = (float*)layers[layerId]->getOutputWrapper()->getHostArray();
    int items = layers[layerId]->getOutputNumElements();
    for (int i = 0; i < items; i++)
    {
        if (std::isfinite(result[i]) == false)
        {
            cout << "Found error at layer " << layerId << " " << layers[layerId]->asString() << endl;

            ofstream dumpData;
            dumpData.open("ForwardDump.txt");

            layers[layerId - 1]->getOutputWrapper()->copyToHost();
            float* input = (float*)layers[layerId - 1]->getOutputWrapper()->getHostArray();
            int inputItems = layers[layerId - 1]->getOutputNumElements();

            dumpData << "Input data of layer " << layers[layerId]->asString() << endl;
            for (int inputIndex = 0; inputIndex < inputItems; inputIndex++)
            {
                dumpData << input[inputIndex] << "  ";
            }

            dumpData << endl << "Output data of layer " << layers[layerId]->asString() << endl;
            for (int outputIndex = 0; outputIndex < items; outputIndex++)
            {
                dumpData << result[outputIndex] << "  ";
            }
            dumpData.close();

            throw runtime_error("Non-finite number found");
        }
    }
    #pragma endregion
}

Assuming input[0] -> output[0]

Gives me these input values producing NaN which makes no sense

hughperkins commented 7 years ago

Yes, that doesnt make any sense. Are you sure you've transferred the data from gpu to cpu, before printing it, and seeing hte nans? Make sure you put a clFinish() before and after the transfer, just to be sure.

merceyz commented 7 years ago

Alright, after a lot of digging I now know why it happens. As soon as it predicts a 100% chance of a label it starts to go "insane"

I'll attach the log so that you can see for yourself. Entries and where they come from:

"Forward" https://github.com/hughperkins/DeepCL/blob/master/src/net/NeuralNet.cpp#L187
"Backwards" https://github.com/hughperkins/DeepCL/blob/master/src/net/NeuralNet.cpp#L224

"Input to loss" https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L87
Value is: `output[imageOffset + label]`
"Output from loss" https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L87
Value is: `-log(output[imageOffset + label])`
"LOSS IS CURRENTLY" https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L90
Value is: `loss`

"Grad out" https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L151
Value is: `output[imageOffset + plane]`
"GradOut final" https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L158

TrainingLog.txt

hughperkins commented 7 years ago

Alright, after a lot of digging I now know why it happens. As soon as it predicts a 100% chance of a label it starts to go "insane"

Wow, thats impressive work Chris. Very nice. I'm very impressed :-)

Ok, so whats happening, I htink, is:

if the input to the softmaxloss layer is 0, then log(0) evaluates to Inf, and then it's over really

But eg log(0 + 1e-6) is no longer infinity, but -13. Can you try adding + 1e-6 into each of the log terms, in the forward direction, and see what happens? For example, this line:

https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L76

                loss += - log(output[ imageOffset + label ]);

would become:

                loss += - log(output[ imageOffset + label ] + 1e-6);

(you can also try +1e-8. I think both will fix the issue, probably. They might give ever so slightly different results, I'm not sure which set of results will be 'better')

merceyz commented 7 years ago

It didn't help, I think the problem is with the calcGradInputFromLabels

calcLossFromLabels -> When the input is 1 it's output is 0 calcGradInputFromLabels -> it's input is now 0 so it's output is -1, in the event its input is 1 it's output is now 0, assuming the output from this (0) is multiplied with anything in the network we have everything eventually turning in to 0 and eventually NaN

Also are these two lines correct? https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L151 https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L158

hughperkins / DeepCL

Unknown Adadelta trainer error/issues #87