Multiple outputs and grid inputs/outputs

merceyz commented 7 years ago

Hello,

Is there a possibility to have multiple output labels on one input image in a network?

For example with this image the output would be "16-4=?" 10062075457ab6ba2360aa1 51163490

Currently i'm running image processing on the image to get the individual characters then sending one at a time to the network. This works but if the processing fails a little bit the outputs of the network makes no sense

And for the second part of this "issue" I have this 4x4 grid of images, where i would like to know which box in the grid contains X

In this example i want to get all street signs which is in box 9 and 10 15ae2165b981965aa03d41c99a754f6e

Currently i'm just giving it each box and hoping for the best. Sadly this isn't giving the best of accuracy but works as a last resort. Is there a better way to do this? I was thinking I could give it the entire image and then have it return a rectangle of where it found X, I could then myself calculate which box it was. This would kind of be like detecting where in an image faces are.

Looking forward to a clever idea on how to tackle this.

hughperkins commented 7 years ago

It looks like https://arxiv.org/abs/1312.6082 provides a solution for the first problem. I need to read through this to find out to what extent it's doable in DeepCL.

merceyz commented 7 years ago

For the second problem i believe something along the lines of this would work: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Li_A_Convolutional_Neural_2015_CVPR_paper.pdf

https://github.com/heuritech/convnets-keras https://github.com/tpfister/caffe-heatmap

EDIT: Actually it could probably work for both problems

hughperkins commented 7 years ago

Ok. I havent read the Li paper yet. As far as the paper I cited above, basically what it does is have a normal convolutional network, with a softmax layer at the end. Except instead of having just one softmax, it has 5, one per digit, each softmax has 10 possible values, ie digits 0-9. They also have a softmax that outputs the length of the sequence, ie 1-5, or 'more than 5', so 6 possible values.

DeepCL doesnt directly support having a bunch of softmax at the end, at least not cleanly. There seem to be a couple of options:

put a FullyConnectedLayer at the end of the network, with the same number of neurons as the number-of-softmax * outputs-per-softmax; and then handle the softmax oneself, in one's own code
make the final FullyConnected layer output a stack of planes, where the number of planes equals the number of softmaxes one wants, and the number of pixels in each image is the same (or greater) than the number of values per softmax. This wont be terribly ideal, since the number of softmax values would need to be a square ,like 4, or 9, or 16. Would kind of work though

The second option might be easier, if one can put up with the hackiness. Note that it might not be quite implemented yet, but it should be not too hard to implement. Actually it might be implemented, I'm not sure, you'd need to try ,and see what happens. The relevant code is at https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L132-L141 It runs on CPU, standard C/C++, no GPU stuff, and its just a few lines, so shouldnt be horrible to hack around too much? Note that you also need to calculate the gradient https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L170 , and there are a couple of other functions that might need to be tweaked slightly https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L209 https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L258

The first option measn basically writing the function from https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L132-L141 in one's own code, though it would mean it could be written for example in C#, instead of in C++.

The cleanest option would be for DeepCL to support DAGs, graphs, rather than a single simple pipeline from one layer to the next. This would be a bunch of work though.

hughperkins commented 7 years ago

As far as heatmaps, "Deep neural networks for object detection", https://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection seems to be a reasonable reference for this. What it looks like they do is that they have a standard convolutional network, but the output is image-shaped (though downsampled, eg each pixel represents say 16x16 square in the original image), and the output is '1' if eg its a car, and '0' otherwise. It looks like they have training data for this though, that comprises bounding box information, which sounds hard to obtain. I think the keras example you found maybe avoids needing such data. Reading...

hughperkins commented 7 years ago

For the keras heatmaps approach, which looks a lot easier to use, since it just needs normal training data, eg pretrained vgg network, alexnet, or similar, it looks like they take the pretrained network, which has convolutional layers, followed by one or more fully connected layers, and they do some magic to transform the fully connected layers into convolutional layers, using the same weights, transformed slightly, and ... thats it! :

https://github.com/heuritech/convnets-keras/blob/master/convnetskeras/convnets.py#L72-L84

        for layer in convnet_heatmap.layers:
            if layer.name.startswith("conv"):
                orig_layer = convnet.get_layer(layer.name)
                layer.set_weights(orig_layer.get_weights())
            elif layer.name.startswith("dense"):
                orig_layer = convnet.get_layer(layer.name)
                W,b = orig_layer.get_weights()
                n_filter,previous_filter,ax1,ax2 = layer.get_weights()[0].shape
                new_W = W.reshape((previous_filter,ax1,ax2,n_filter))
                new_W = new_W.transpose((3,0,1,2))
                new_W = new_W[:,:,::-1,::-1]
layer.set_weights([new_W,b])

So, you'd need to do something similar:

load vgg weights (somehow)
iterate through the layers
if the layer is convolutional, just leave as-is, otherwise
if its fully connected, transform into a convolutional layer, transforming the weights from the original fully connected layer (I'm still reading that bit of code to figure out quite how they are transforming the weights).

As far as this bit load vgg weights (somehow), presumably you'd want to load a pre-trained VGG/alexnet network? So, you'd use some way, some language, to load such a pretrained network, then do the transforms, then write to a weights file-format that DeepCL can understand. All of this can be done in any language, eg C#. For the DeepCL weights file-format, I can help describe it, or you can initially look through the function that saves the weights https://github.com/hughperkins/DeepCL/blob/master/src/weights/WeightsPersister.cpp#L83-L104 Note that it's not quite as short as it seems, since it uses this function https://github.com/hughperkins/DeepCL/blob/master/src/weights/WeightsPersister.cpp#L42-L52 , which calls persistToArray on each layer. ie, for conv layers, https://github.com/hughperkins/DeepCL/blob/master/src/conv/ConvolutionalLayer.cpp#L317-L324

hughperkins commented 7 years ago

Thinking this through (because it's easier than reading the code), what we have in the original network is something like:

a fully connected layer
with as many output neurons as there are classes, eg 1000
one neuron per class

What we want is:

a convolutional layer
as many output planes/images as there are classes, eg 1000
each image/plane should be about the same size/shape as the original image

.... and so .... thinking...

hughperkins commented 7 years ago

... so, presumably, well, it is the case that, the fully connected layer will take as input all the previous pixels in the previous layers, like all of them, from all images in the previous layer's output image stack, and connect those all together to one neuron.

But we want each neuron in our convolutional layer to make a decision based on just one, or a few, of the pixels, in the previous layer. So, we should perhaps take only the connections in the fully connected layer for one column through the image stack, of the previous layer, corresponding to one single pixel, and connect those to one pixel. Hmmm. except that... in a convolutional network, the weights are position independent... so ... hmmm...

Comparing number of weights:

number of weights in fully convolutional layer = inputPlanes * size * size * numClasses
for convolutional layer: inputPlanes * numClasses * kernelSize * kernelSize

Are they simply making the kernelSize == the image size? That would make the weights exactly the same size, and seems to match what the python code above is doing? That would kind of make some amount of sense (though not sure how well that will work near the edges?). It also sounds pretty easy to do :-)

merceyz commented 7 years ago

presumably you'd want to load a pre-trained VGG/alexnet network?

I was planning to train them myself

It also sounds pretty easy to do :-)

No matter how many times i read your comments i get more and more confused. Not because you write nonsense but i just can't comprehend how to do this.

hughperkins commented 7 years ago

I was planning to train them myself

Ok, sounds good.

No matter how many times i read your comments i get more and more confused. Not because you write nonsense but i just can't comprehend how to do this.

Fair enough. Just to be clear, when I say 'easy', I mean, 'not weeks of work', but it's probably a few hours, maybe a day or two.

Thinking it through... there are two ways of doing this:

method 1: read in the saved weights file, somehow, in some language, rewrite out to a new file, with the weights transformed
method 2: read in the saved weights file using predict.cpp, update the network to fully convolutional on the fly

The second method sounds easier to me, since predict.cpp already loads the network, and the weights. So we "just" have to convert the fully convolutional layers to convolutional layers

Thinking this through, using the second method, we need the following steps:

read in the weights. If you're training the weights using DeepCL, then you can load them in in the same way that predict.cpp loads them in, ie by this point in predict.cpp, the network and the weights have been loaded https://github.com/hughperkins/DeepCL/blob/master/src/main/predict.cpp#L162
iterate over the network layers, looking for fully connected layers. You already have code in your predict.cpp to do this
when you find a fully connected layer, we need to somehow convert it into a convolutional layer. This is a bit tricky actually. There are a few things we'll need to do:
- either create a new convolutional layer,
- or use the one contained inside the fully connected layer, and just rearrange the weights a bit
- in eihter case:
- insert it into the network stack, in place of the fully convolutional layer
- connect it to the layers before/after it
- rearrange the weights inside of it

Let's say we create a new convolutional layer. So... (goes off to search....), there's an example of creating a convolutional layer, and hooking it to the previous layer, in fullyconnected layer itself https://github.com/hughperkins/DeepCL/blob/master/src/fc/FullyConnectedLayer.cpp#L24-L29

    ConvolutionalMaker *convolutionalMaker = new ConvolutionalMaker();
    convolutionalMaker->numFilters(numPlanes * imageSize * imageSize)
                      ->filterSize(previousLayer->getOutputSize())
                        ->biased(maker->_biased)
                        ->weightsInitializer(maker->_weightsInitializer);
convolutionalLayer = new ConvolutionalLayer(cl, previousLayer, convolutionalMaker);

And a bit more at https://github.com/hughperkins/DeepCL/blob/master/src/fc/FullyConnectedLayer.cpp#L40-L41

    convolutionalLayer->previousLayer = this->previousLayer;
convolutionalLayer->nextLayer = this->nextLayer;

Actually, it seems the convolutional layer inside the fully connected layer is already hooked up to the previous and next layers, so maybe we just reuse it. We cna get hold of it because its stored in convolutionalLayer property, in the fc layer, and we already got hold of this earlier, whilst iterating over the network.

The convolutionalLayer layer inside hte fully connected layer already is connected to the previous and next layers, it already exists, and it already has a weights array that is exactly the right size (as far as I can tell, if my theory for how the keras version works, which I'm sort of 70% sure :-P ). So, the only things we need to do really are:

slot the convolutional layer into the network 'stack' in place of the fullconnected layer
rearrange the weights

As far as 'slotting it into the network stack', the network stack is the thing we've been iterating over. So we are iterating something like (goes off to find earlier issue with the code in...):

for (int layerId = 0; layerId < net->getNumLayers(); layerId++) {
    Layer *layer = net->getLayer(layerId);
    string name = layer->getClassName();

    if (name == "ConvolutionalLayer")
    {
        ConvolutionalLayer *conv = dynamic_cast<ConvolutionalLayer *>(layer);
        ForwardAuto *forwardAuto = dynamic_cast<ForwardAuto *>(conv->forwardImpl);
        myfile << toString(layerId) + "|" + toString(forwardAuto->chosenIndex) << endl;                             
    }
    else if (name == "FullyConnectedLayer")
    {
        FullyConnectedLayer *fc = dynamic_cast< FullyConnectedLayer *>(layer);
        ForwardAuto *forwardAuto = dynamic_cast<ForwardAuto *>(fc->convolutionalLayer->forwardImpl);
        myfile << toString(layerId) + "|" + toString(forwardAuto->chosenIndex) << endl;
    }
}

We can ignore the conv layers, leave htem as is, so our iteration code is:

for (int layerId = 0; layerId < net->getNumLayers(); layerId++) {
    Layer *layer = net->getLayer(layerId);
    string name = layer->getClassName();

    if (name == "FullyConnectedLayer")
    {
        FullyConnectedLayer *fc = dynamic_cast< FullyConnectedLayer *>(layer);
        ForwardAuto *forwardAuto = dynamic_cast<ForwardAuto *>(fc->convolutionalLayer->forwardImpl);
        myfile << toString(layerId) + "|" + toString(forwardAuto->chosenIndex) << endl;
    }
}

... and also w edont need the forwardauto object, so:

for (int layerId = 0; layerId < net->getNumLayers(); layerId++) {
    Layer *layer = net->getLayer(layerId);
    string name = layer->getClassName();

    if (name == "FullyConnectedLayer")
    {
        FullyConnectedLayer *fc = dynamic_cast< FullyConnectedLayer *>(layer);
    }
}

Alright, the network stack is in the net object, (goes off to find the code) here: https://github.com/hughperkins/DeepCL/blob/master/src/net/NeuralNet.h#L40

class DeepCL_EXPORT NeuralNet : public Trainable {
protected:
std::vector< Layer *> layers;

unfortunately protected. Hmmm. .... I think I'm not keen on removing the protected bit, so I think we need a method like replaceLayer(int index, Layer *newLayer), in NeuralNet class https://github.com/hughperkins/DeepCL/blob/master/src/net/NeuralNet.h#L35 , that will look someting liek:

void NeuralNet::replaceLastLayer(int index, Layer *newLayer) {
    layers[index] = newLayer;
}

So, currently our network stack, ie layers looks like eg:

0: inputlayer
1: conv layer
2: maxpooling
3: conv layer
4: maxpooling
5: fullyconnected layer

And let's say we dont have a softmax over the end (if there is, we need to remove that... but we cna think about that later...)

We are iterating over the network, and we got to layer 5, ie 'fullyconnected layer'. We have a variable conv containined the convolutional layer that is inside that fully connected layer. We call:

net->replaceLayer(5, conv);

... and now the network stack looks like:

0: inputlayer
1: conv layer
2: maxpooling
3: conv layer
4: maxpooling
5: conv layer (the one that was inside the fully connected layer)

As we can see in the fully connected layer code from earlier, that conv layer is already connected to the apprpirate previous and next layer, since the fully connected layer alreday hooked it up. I think that we've now replaced the fully connected layer with a convolutional layer. And note that the weights inside that convolutional layer are the weights from the fully connected layer, since the fully connected layer doesnt actually contain any weights itself: it just uses the convolutional layer to do all the work.

Alright, so now we need to rearrange the weights (probalby). So, let's think about how the weights are arranged currently, and how we want them to be arranged.

The weights in a convolutional layer are arranged like (goes off to find some code...) well... this bit calls a method called getWeight https://github.com/hughperkins/DeepCL/blob/master/src/conv/ConvolutionalLayer.cpp#L233 , but thats not in the conv layer, so probbly in Layer. Lets look in Layer... https://github.com/hughperkins/DeepCL/blob/master/src/layer/Layer.cpp Hmmm, no. So ... Ah, its in the convolutoinal layer header https://github.com/hughperkins/DeepCL/blob/master/src/conv/ConvolutionalLayer.h#L81

    inline float getWeight(int filterId, int inputPlane, int filterRow, int filterCol) const {
        return weights[ getWeightIndex(filterId, inputPlane, filterRow, filterCol) ];
}

unfortuantely, it just calls getWeightIndex to figure out the location in the weights array, so we still dont know hte order. Lets check in getWeightIndex. its in the same file https://github.com/hughperkins/DeepCL/blob/master/src/conv/ConvolutionalLayer.h#L75-L80 :

    inline int getWeightIndex(int filterId, int inputPlane, int filterRow, int filterCol) const {
        return (( filterId 
            * dim.inputPlanes + inputPlane)
            * dim.filterSize + filterRow)
            * dim.filterSize + filterCol;
}

Ok, so the weights are arranged like: `weights[outputFilter, inputFilter, kernelRow, kernelColumn]

Cool. Ok, going back to the earlier assertion I make tentatively about what we want to do. We have the weights in the fully connected layer, which has fully connected dimensions like:

numInputNeurons
numOutputNeurons

... well, and thats it, but this is implemented using a convolutional network that looks like ... (goes off to search...) https://github.com/hughperkins/DeepCL/blob/master/src/fc/FullyConnectedLayer.cpp#L24-L29

    convolutionalMaker->numFilters(numPlanes * imageSize * imageSize)
                      ->filterSize(previousLayer->getOutputSize())
                        ->biased(maker->_biased)
->weightsInitializer(maker->_weightsInitializer);

Ok, this is a bit complicated, so lets draw this out a bit. Let me save tihs comment first, before I lose it...

hughperkins commented 7 years ago

Ok, so here's how our network layers 4 and 5 look like, before we change anything:

layer 4: convolutoinal layer.  lets say it has the following geometry:
      numOutputPlanes: filters_in
      outputImageSize: size_in
      kernel size (we dont care)
layer 5: fully connected layer:
     numInputPlanes: filters_in
     inputImageSize: size_in
     numPlanes: numNeurons
     outputSize: 1
     kernelSize: size_in
     padding: false

So, the way the fully connected layer is, if you take a convolutional layer, whose kernel size is exactly the same as the image size, and with no padding, then the output image size will be exactly one. The convolutional filter cannot move/slide over the image, since its the same size as the image, and theres no padding. Then, if we have say 10 output planes, then the output of this convolutional layer is a stack of 10 images, each with a single pixel. Its the same as a fully connected layer, with 10 output neurons. Mathematically its the same. And to save duplicating code, I just use a convolutioanl alyer inside the fully conneced layers :-P

And the geometry is as above, in the second to last paragraph. So the geometry of the weights is, from earlier:

weights[outputFilters, inputFilters, kernelHeight, kernelWidth]

... which slotting in the numbers from the fully connected layer, thats using a convolutaionl layer insdie it, is:

weights[numNeurons, filters_in, size_in, size_in]

hughperkins commented 7 years ago

now, going back to the earlier assertion about how the keras heatmap works, we want the weights of the new convolutional layer to be ordered like:

inputPlanes=filters_in
outputPlanes = numClasses
kernelSize = inputSize

(numClasses and outputPlanes are the same thing here. sorry for mixing notation... and inputSize and size_in are the same thing ...)

So, referring back to the layout of weights in a convolutional layer, which is:

weights[outputFilters, inputFilters, kernelHeight, kernelWidth]

...and slotting in the numbers for the new convolutional layer, we want:

weights[numClasses, filters_in, inputSize, inputSize]

but we have currenlty:

weights[numClasses, filters_in, inputSize, inputSize]

... hmmm... so it seems... we have nothing to do??? Seems like the weights are already in the right order.

So, if thats the case, you can skip most of the last 100 lines or so, and "all" you hav eto do is:

modify predict.cpp to iterate over the network, looking for fully connected layers
when you find a fully conected layer, grab the convolutional layer from inside of it
slot it into the network stack (ie NeuralNet.layers), in place of the old Fully connected layer, by using a new function replaceLayer
(which function you need to write, but should look more or less/ exaclty like detailed earlier)
and ..
oh, remove any softmax layer at the end of the network stack. (goes to check code...) https://github.com/hughperkins/DeepCL/blob/master/src/net/NeuralNet.cpp There is no method to remove a layer, so you probably need to do two things:
- set the 'nextLayer' of the last convolutional layer to 0, and
- add a method removeLayer(int index) to NeuralNet, which will simply remove the last item from the layers vector (I dont remember how to do that... hmmm... (googles "remove item from vector c++") hmmm.... http://www.cplusplus.com/reference/vector/vector/erase/ ugghh... I think you have to write something like:

void NeuralNet::removeLayer(int index) {
    this->layers.erase(this->layers.begin() + index);
}

and ... hmmm ... I think thats it. What this will give you is...

What this will give you is a network taht outputs a 'heatmap' of which pixels are showing what type of thing, like a street sign, or sky, or whatever (according to what classes you trained it on).

Not sure:

to what extent the heatmap is what you are looking for?
to what extent the above sounds ... well... yeah, theres quite a lot to do really, to be honest :-)

hughperkins commented 7 years ago

oh wait, we do need to do one other thing, but hopefully easy: we need to switch padding on on the new convolutional layers. Otherwise they'll continue to ouptut one pixel per plane, which isnt waht we want. I think that should be straightforwad. (checks code...) Oh https://github.com/hughperkins/DeepCL/blob/master/src/conv/ConvolutionalLayer.h#L41 .. its buried in the dim object. dim object looks like:

https://github.com/hughperkins/DeepCL/blob/master/src/conv/LayerDimensions.h#L15 :

class DeepCL_EXPORT LayerDimensions {
public:
    int inputPlanes, inputSize, numFilters, filterSize, outputSize;
    bool padZeros, isEven;
    bool biased;
    int skip;

    int inputCubeSize;
    int filtersSize;
    int outputCubeSize;
    int numInputPlanes;

    int outputSizeSquared;
    int filterSizeSquared;
    int inputSizeSquared;

    int halfFilterSize;

padZeros is what we want to change, and it's public, so we can just modify it. and dim in convolutoianl alyer is .. (checks...) https://github.com/hughperkins/DeepCL/blob/master/src/conv/ConvolutionalLayer.h#L31-L41 ... public, so we can do something like conv->dim->padZeros = true; ... but (looking at dim object code... ) theres a method to update it: https://github.com/hughperkins/DeepCL/blob/master/src/conv/LayerDimensions.h#L83

    LayerDimensions &setPadZeros(bool padZeros) {
        this->padZeros = padZeros;
        deriveOthers();
        return *this;
}

... so better to call that to set it.

After doing that, do we need to redimension the buffers? Thinking about it, this means the output buffer sizes will be all wrong.. .so ... so, (checking convolutionallayer code...) we might want to call setBatchSize:

https://github.com/hughperkins/DeepCL/blob/master/src/conv/ConvolutionalLayer.cpp#L277-L299

VIRTUAL void ConvolutionalLayer::setBatchSize(int batchSize) {
    if(batchSize <= allocatedSpaceNumExamples) {
        this->batchSize = batchSize;
        return;
    }

    this->batchSize = batchSize;
    this->allocatedSpaceNumExamples = batchSize;

    delete outputWrapper;
    delete[] output;

    delete gradInputWrapper;
    delete[] gradInput;

    output = new float[getOutputNumElements()];
    outputWrapper = cl->wrap(getOutputNumElements(), output);

    if(layerIndex > 1) {
        gradInput = new float[ previousLayer->getOutputNumElements() ];
        gradInputWrapper = cl->wrap(previousLayer->getOutputNumElements(), gradInput);
    }
}

... but I reckon if we make sure we shuffle the network around in predict.cpp before w ecall that, then we're ok, dont need to do this. checking predict.cpp ... setBatchSize is called at https://github.com/hughperkins/DeepCL/blob/master/src/main/predict.cpp#L170 , which is just after loadin the weights:

    // weights file contains normalization layer parameters as 'weights' now.  We should probably rename weights to parameters
    // sooner or later ,but anyway, tehcnically, works for onw
    if(!WeightsPersister::loadWeights(config.weightsFile, string("netDef=")+netDef, net, &ignI, &ignI, &ignF, &ignI, &ignF) ){
        cout << "Cannot load network weights from weightsFile." << endl;
        return;
    }

    if(verbose) {
        net->print();
    }
    net->setBatchSize(config.batchSize);
if(verbose) cout << "batchSize: " << config.batchSize << endl;

... so thats perfect: we hack the network around, after loading the weights, but before setting the batch size, and that should all work ok. (cross fingers... :-P )

merceyz commented 7 years ago

to what extent the heatmap is what you are looking for?

If you look at the image of the street sign in my first comment you can see that i want box 9 and 10 So the network will be trained for street signs where 1 is street signs and 0 is anything else. So for the heatmap i'd need it to light up box 9 and 10. (then somehow calculate that it is 9 and 10)

Network is currently built up like this netdef=16c3z-tanh-mp2-32c3z-tanh-mp2-64c3z-tanh-mp2-128c3z-tanh-mp2-100n-tanh-2n (Tanh gave better results than ReLu.) So it takes one box (96x96x3) and says if it contains a street sign or not.

Would it not be possible to just set the FC layers to Conv layers in the netdef?

I'll make a fork and try to hack around to do "all" the changes you specified

hughperkins commented 7 years ago

If you look at the image of the street sign in my first comment you can see that i want box 9 and 10 So the network will be trained for street signs where 1 is street signs and 0 is anything else. So for the heatmap i'd need it to light up box 9 and 10.

Ah, yes, if you have training data that consists of examples, where each example consists of ... hmmm... suddnely I think I'm missing something. Cant you just feed in each of the boxes as a single image, and each image is classified as '1' street sign or '0' not a street sign? I guess this is what you're doing currenlty actually? But ... you want to batch them up? or ... ?

merceyz commented 7 years ago

That's exactly what i'm currently doing, but when you hit images where the street sign is (example) filling up one box + 10 pixels of another box the accuracy gets brutally murdered.

merceyz commented 7 years ago

For example on this one the correct outputs: 1111 1111 1111 0001

The network however outputs: 1011 1011 1011 0000

08052432c6bae52606413ebcce92ffda

merceyz commented 7 years ago

Or this one where the correct ones are: 0000 0110 0000 0000

It will output 0000 0100 0000 0000 5319afd66f21db7aff11f60d4949af3a

hughperkins commented 7 years ago

Well... I think your network outputs look pretty reasonable on the whole? and if that was a captcha, and, for the 90-90-127 images, if you had to pick the pictures containing a street sign, I think the answers from your network are reasonable?

For the second example, I suppose a captcha might expect you to pick the box containing that tiny slither of sign, but a heat map at the granularity yo uare doing (4x4) is very reasonably I think going to say that that square doesnt contain any sign.

I think that if you want to know that square contains a bit of sign, you'd probalby need to (as you are probably proposing):

send the entire image, in one go, to the conv net( because, if you yourself looked at the square with the sliver of sign, you'd proablby say it didnt contain sign right? its only when you look at the entire image, you know clearly its sign)
use a ... hmm. ... not sure if it needs a really fine heatmap to be honest, maybe 4x4 is ... no wait, yeah, I think it should have a granularity small enough that the tiny sliver of sign fills many of hte pixels in the heatmap proably? so maybe you want eg a 32x32 or 64x64 heatmap? I've never played with heatmaps, so I think you'll have to play around a bit to find what works.

But yeah, seems like a heatmap might be what you need. And then I guess you just pick the sub-images that have at least a few pixels on the heatmap 'lighting up'?

hughperkins commented 7 years ago

Would it not be possible to just set the FC layers to Conv layers in the netdef?

It seems like .. yes... if/since the weight layout seems not to change. The only thing is that, in the code that loads the weights, it checks that you are using the same netdef you trained on, https://github.com/hughperkins/DeepCL/blob/master/src/weights/WeightsPersister.cpp#L129-L132

        if(trainingConfigString != std::string(data + 7 * 4)) {
            std::cout << "training options dont match weights file" << std::endl;
            std::cout << "in file: [" + std::string(data + 7 * 4) + "]" << std::endl;
            std::cout << "current options: [" + trainingConfigString + "]" << std::endl;

... but that should be relatively straightforward to hack out (eg just comment out this bit of code, in WeightsPersister)

merceyz commented 7 years ago

The outputs from the network are good, but i always try to make them better ;)

You seem to have understood my plan, so that's good. I'm just not entirely sure how to implement the heatmap part but i'll try my best. I guess a heatmap the same size as the input image would be too much?

I guess i could sum up the number of activations of each "box" in the heatmap and if the sum is > X then it contains a street sign.

Then comes the question of how to train it, i could without problems specify a rectangle of where in the image a street sign is so that's not a problem. The problem is more how to translate that to activations

hughperkins commented 7 years ago

(or, you could just hexedit the netdef, at the start of the weights file, to change it from an fc to a conv, and to turn on zero padding)

merceyz commented 7 years ago

It seems like .. yes... if/since the weight layout seems not to change. The only thing is that, in the code that loads the weights, it checks that you are using the same netdef you trained on, https://github.com/hughperkins/DeepCL/blob/master/src/weights/WeightsPersister.cpp#L129-L132

Retraining isn't an issue (most likely have to anyways) so how would the netdef string look like if i replaced the FC layers with Conv layers?

hughperkins commented 7 years ago

I guess a heatmap the same size as the input image would be too much?

I dont know? Maybe it's ok? I dont think the last few layers affect the speed much: most of the time goes into the first few layers of the network I think? Hmmm... oh... because we already went through a bunch of maxpoolings... so... I dont know :-) I guess you'll need to experiment a bit and/or read around a bit.

Then comes the question of how to train it

I think the way the heatmap works is, you simply train it as per a normal network, with images that do/dont contain signs, and then magically change it from a classifier (with fullyconnected) to the heatmap (using the convolutional), and presto! it seems it should magiclaly work :-P

I imagine it doesnt work quite so magically in practice, proably needs a bunhc of tweaks/tricks/ experimentation. But seems like as far as conceptually how to train it, that it's just standard training examples, nothing specialized for heatmaps?

merceyz commented 7 years ago

Ok, back to the netdef string, what should i put instead of "100n-tanh"

Also for the first problem i though of something along the lines of this: http://yann.lecun.com/exdb/lenet/

merceyz commented 7 years ago

Also how would training on 96x96x3 vs predicting on 384x384x3 work? Concidering the fist conv layer isn't wired up to that size

hughperkins commented 7 years ago

Retraining isn't an issue (most likely have to anyways) so how would the netdef string look like if i replaced the FC layers with Conv layers? Ok, back to the netdef string, what should i put instead of "100n-tanh"

I think you'd need to:

figure out how big the incoming image to that layer is, let's say its 7x7, so then you'd need (I think):

100c7z-tanh

I'm not entirely sure, not having ever tried it. It might be that my analysis above ('above' meaning, 'in this thread') is not quite correct. Let's try it and see what happens.

For your particular example, if the incoming image is 3x96x96, and your netdef is:

netdef=16c3z-tanh-mp2-32c3z-tanh-mp2-64c3z-tanh-mp2-128c3z-tanh-mp2-100n-tanh-2

Then, the conv layers are zeropadded, so dont change the image size (as long as kernels have an odd size, which they do, ie 3). And tanh doesnt change image size either. So we just need to look at the max poolings:

input: 96
mp2: 96 => 48
mp2: 48 => 24
mp2: 24 => 12
mp2: 12 => 6

So seems that the image size is 6, at the point it enterrs the 100n layer, so I think it'd become:

100c6z-tanh

hughperkins commented 7 years ago

Hmmm, this will probably generate a heatmap of size 6x6 I guess? I'm not sure what the solution to this is. Maybe remove some of the maxpoolings form th enetwork (during training too)? I'm not quite sure...

merceyz commented 7 years ago

I can no longer build.

Build log.txt

hughperkins commented 7 years ago

Hmmm, nothings changed in the code ... and the error is hard to parse... what happens if you purge/rename the old C:/Users/Unknown/Desktop/Build and E:/Programming/DeepCL directories out of hte way, reclone, and rebuild?

(random ideas:

out of disk space?
permissions issue?
filesystem got corrupted/ needs checking, and a file cant be overwritten?

)

hughperkins commented 7 years ago

(oh, one other random idea: is there a process still running, ie a deepcl process ,and that is locking up the various dlls it's trying to intsall?)

merceyz commented 7 years ago

(oh, one other random idea: is there a process still running, ie a deepcl process ,and that is locking up the various dlls it's trying to intsall?)

Right, right. There were 6 of them running. Never mind me ^^

hughperkins commented 7 years ago

:)

On September 12, 2016 1:27:33 PM GMT+01:00, Chris notifications@github.com wrote:

(oh, one other random idea: is there a process still running, ie a deepcl process ,and that is locking up the various dlls it's trying to intsall?)

Right, right. There were 6 of them running. Never mind me ^^

You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/DeepCL/issues/93#issuecomment-246331765

Sent from my Android device with K-9 Mail. Please excuse my brevity.

merceyz commented 7 years ago

I've seen that 4096 number on a lot of networks related to this model, do you know where they get that number from?

https://github.com/heuritech/convnets-keras/blob/master/convnetskeras/convnets.py#L133

merceyz commented 7 years ago

Also this seems to explain how the heatmap works in details. https://github.com/heuritech/convnets-keras/issues/1#issuecomment-215014259

If i understand it correctly the easiest thing would be to replace the normal softmax layer with their softmax4d layer.

merceyz commented 7 years ago

I made it print out the layer class names and then i took all the outputs from the 16th layer, I figured i don't actually have to remove the softmax layer as i can just jump to the layer i want.

Now the question becomes, how do i convert this to a heatmap

InputLayer NormalizationLayer ConvolutionalLayer ActivationLayer PoolingLayer ConvolutionalLayer ActivationLayer PoolingLayer ConvolutionalLayer ActivationLayer PoolingLayer ConvolutionalLayer ActivationLayer PoolingLayer FullyConnectedLayer ActivationLayer FullyConnectedLayer SoftMaxLayer

Using layerCount - 3 0.999664 -0.999735 -0.999467 0.942681 0.94419 -0.999234 0.99908 -0.994407 0.628713 0.99968 0.993852 0.997357 -0.997864 0.983499 -0.879967 0.640155 0.997375 0.998771 0.311274 0.998372 0.994437 -0.994366 -0.998619 -0.997232 -0.99787 -0.993109 0.994408 0.990023 -0.999341 -0.98643 0.900372 0.997575 0.892749 0.99948 -0.982446 0.867922 -0.999686 0.998534 0.982899 0.999503 0.999296 0.997436 0.805877 0.033426 -0.989653 -0.992828 0.8855 -0.997751 -0.976028 0.99752 0.994597 0.985066 0.999084 -0.999453 -0.987145 -0.999575 0.999013 -0.996665 -0.998437 -0.998563 -0.999607 0.999408 -0.97378 0.908925 0.991686 0.963303 -0.955857 -0.962318 -0.998844 0.996011 -0.998822 0.999392 -0.674896 -0.997652 0.831925 0.997553 0.997558 0.996225 0.995423 0.995407 0.99897 0.542301 -0.998071 -0.953574 0.874348 0.982294 -0.342405 -0.950313 -0.993449 0.836201 0.999562 0.999385 -0.995058 -0.985481 0.998173 0.981515 0.997218 -0.999885 -0.317049 0.877932

hughperkins commented 7 years ago

I assume this is the outputs of something that is num-classes * image-width * image-height? Can you print out the network, from predict ,so I can see the geometry of each layer?

merceyz commented 7 years ago

Note: Network isn't the same as in the outputs you see above

Printed out using this:

for (int i = 0; i < net->getNumLayers(); i++)
{
    Layer *temp = net->getLayer(i);
    myfile << temp->asString() << endl << endl;
}

netdef=16c3z-tanh-mp2-32c3z-tanh-mp2-64c3z-tanh-mp2-128c3z-tanh-mp2-128c6-tanh-128c1-tanh-2c1-tanh-2n

InputLayer{ outputPlanes=3 outputSize=96 }

NormalizationLayer{ outputPlanes=3 outputSize=96 translate=-15.5345 scale=0.0079287 }

ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=96 numFilters=16 filterSize=3 outputSize=96 padZeros=1 biased=1 skip=0} }

ActivationLayer{ TANH }

PoolingLayer{ inputPlanes=16 inputSize=96 poolingSize=2 }

ConvolutionalLayer{ LayerDimensions{ inputPlanes=16 inputSize=48 numFilters=32 filterSize=3 outputSize=48 padZeros=1 biased=1 skip=0} }

ActivationLayer{ TANH }

PoolingLayer{ inputPlanes=32 inputSize=48 poolingSize=2 }

ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=24 numFilters=64 filterSize=3 outputSize=24 padZeros=1 biased=1 skip=0} }

ActivationLayer{ TANH }

PoolingLayer{ inputPlanes=64 inputSize=24 poolingSize=2 }

ConvolutionalLayer{ LayerDimensions{ inputPlanes=64 inputSize=12 numFilters=128 filterSize=3 outputSize=12 padZeros=1 biased=1 skip=0} }

ActivationLayer{ TANH }

PoolingLayer{ inputPlanes=128 inputSize=12 poolingSize=2 }

ConvolutionalLayer{ LayerDimensions{ inputPlanes=128 inputSize=6 numFilters=128 filterSize=6 outputSize=1 padZeros=0 biased=1 skip=0} }

ActivationLayer{ TANH }

ConvolutionalLayer{ LayerDimensions{ inputPlanes=128 inputSize=1 numFilters=128 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }

ActivationLayer{ TANH }

ConvolutionalLayer{ LayerDimensions{ inputPlanes=128 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }

ActivationLayer{ TANH }

FullyConnectedLayer{ numPlanes=2 imageSize=1 }

SoftMaxLayer{ perPlane=0 numPlanes=2 imageSize=1 }

hughperkins commented 7 years ago

output above is from that convolutional layer , 4 lines up from the bottom? ConvolutionalLayer{ LayerDimensions{ inputPlanes=128 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} } ?

hughperkins commented 7 years ago

That layer has size=1. I'd expect a layer to be used as a heatmap to have a size a bit more than 1?

merceyz commented 7 years ago

I tried to implement what's shown in the comments here https://github.com/heuritech/convnets-keras/issues/1#issuecomment-215014259 But i think now that their numbers don't fit my model.

Also how would their Softmax4D be implemented in DeepCL?

hughperkins commented 7 years ago

Also how would their Softmax4D be implemented in DeepCL?

Am I right in guessing that their SoftMax4d calculates the soft max for a 'column' of pixels, all in the same position, in each image, in the stack of images?

If so , you should be able to set perPlane to 0, and DeepCL should handle that already. I think. Oh wait... it's if'd out, and marked as not impelmented https://github.com/hughperkins/DeepCL/blob/master/src/loss/SoftMaxLayer.cpp#L112

It wouldnt take tons of work to implement it though. I could probably implement that if that's going to be useful?

merceyz commented 7 years ago

Am I right in guessing that their SoftMax4d calculates the soft max for a 'column' of pixels, all in the same position, in each image, in the stack of images?

I have to be honest, i have no clue. I tried reading what he linked to in his comment (https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb) to get a better understanding. But I just can't manage to understand how it does the heatmap part.

I could probably implement that if that's going to be useful?

If you mean heatmaps in general then yes please :)

hughperkins commented 7 years ago

I meant, the softmax bit :-P

merceyz commented 7 years ago

I guess it wont hurt.

It's so damn annoying i can't manage to understand how something as simple as a heatmap works

hughperkins commented 7 years ago

Maybe we can skype sometime? My skype id is the same as my github id. I'm around from ~1pm to 3pm uk time each day, or till around 11pm uk time at weekends. I cant claim to entirely understand heatmaps, but I have some ideas, and talking often solves lots of issues.

merceyz commented 7 years ago

Added

hughperkins commented 7 years ago

ok :-)

merceyz commented 7 years ago

No longer needed

hughperkins commented 7 years ago

ok

hughperkins / DeepCL

Multiple outputs and grid inputs/outputs #93