keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.47k forks source link

Automatic Tensor Size Calculation #625

Closed bottler closed 7 years ago

bottler commented 9 years ago

I saw the mention on issue number 60 about automatic tensor size calculation, where it's suggested that it would necessitate a change in the API. I've been thinking about this recently.

I've been having a go at a solution which doesn't change the API. Basically the user supplies the sequential container itself in place of the parameter in the layer's constructor. So you can write this: model = Sequential() model.add(Dense(20, 64, init='uniform')) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Dense(model, 64, init='uniform')) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Dense(model, 2, init='uniform')) model.add(Activation('softmax'))

My implementation is in https://github.com/bottler/keras/ .

In addition, the user can add a layer to specify the size, so the above example could begin model = Sequential() model.add(SpecifyShape([20])) model.add(Dense(model, 64, init='uniform'))

The overhead when creating a layer is that a layer should either define get_output_dims(self) or calc_output_dims(self,lastdims). I have added these to some built-in layers.

It currently assumes that "prev=" has not been used - it assumes all layers are sequential.

I would like some feedback on whether this is a good idea, and how it might be improved. I have not yet added any tests, it's probably buggy.

fchollet commented 9 years ago

I've been having a go at a solution which doesn't change the API.

It absolutely does change the API... wouldn't it be simpler to go with:

model = Sequential()
model.add(Dense(20, 64, init='uniform'))
model.add(Dense(output_dim=64, init='uniform'))

This would be fairly easy to implement (the layer weights would not be initialized in the constructor and .add would look at the output of the previous layer and would then call a .initialize(input_dim=...) method on the layer.

This would however break backwards compatibility. Also I don't really like it because it breaks the general assumption that layers are independent entities. So if you are in the mindset that Keras is a set of independent building blocks to quickly assemble new architectures, automatic tensor size calculation is a no go (since fundamentally it makes the previous layers a dependency of the current layer).

bottler commented 9 years ago

Yes, that's sensible.

How about keeping the container's ability to know its top dimension (get_top_dims) but forgetting about using it internally. So you could write model = Sequential() model.add(Dense(20, 64, init='uniform')) model.add(Dense(model.get_top_dims()[0], 64, init='uniform')) There would be no change to Layer constructors. Would that be acceptable?

An example pain point at the moment is where I have convolutions (with border modes and strides etc) and pooling, which feed into a Dense layer. Changing a stride or a window length requires the user to either calculate the output dimension mentally or in separate code.

fchollet commented 9 years ago

There would be no change to Layer constructors. Would that be acceptable?

I think it's a good solution. The only potential issue is that exact output shapes are not fixed in the general case (though for many layers, they are fixed); they depend on the input shape. This is the case in particular for convolutional layers.

So what you would really need is something like model.output_shape(input_shape=...). Not particularly elegant.

ledbetdr commented 9 years ago

I tried another way to approach this problem writing a helper function to do the heavy lifting for you. The function requires 3 paremeters (the current network's .get_config(), input X dim, and input Y dim). It could easily be placed into keras.utils.layer_utils:

def calcDim(config, initialX, initialY):
    #DRL - iterate through the layers of a network to determine what the
    #       total number of output dimensions will be for generation of
    #       the iniial dense layer
    currX = initialX
    currY = initialY

    #DRL - variable to keep track of the final filter count
    finalFilter = 1    

    for layer in config['layers']:

        #DRL - check the columns and change x based on border mode
        if layer.get('nb_col'):
            if layer.get('border_mode') == 'valid':
                currX = currX - layer.get('nb_col') + 1
            else:
                currX = currX + layer.get('nb_col') - 1

        #DRL - check the rows and change y based on the border mode
        if layer.get('nb_row'):
            if layer.get('border_mode') == 'valid':
                currY = currY - layer.get('nb_row') + 1
            else:
                currY = currY + layer.get('nb_row') - 1

        #DRL - apply pooling correction
        if layer.get('poolsize'):
            poolY, poolX = layer.get('poolsize')
            currX = currX / poolX
            currY = currY / poolY

        #DRL - apply subsample correction
        if layer.get('subsample'):
            subY, subX = layer.get('subsample')
            currX = currX / subX
            currY = currY / subY

        #DRL - keep track of the final filter count encountered
        if layer.get('nb_filter'):
            finalFilter = layer.get('nb_filter')

    print 'finalFilter: %d, finalX: %d, finalY: %d' % (finalFilter, currX, currY)
    return finalFilter * currX * currY

The result makes the initial dense layer construction fairly straight forward:

    #initial model construction....
    model.add(Dense(calcDim(model.get_config(), shapeX, shapeY), NUM_HIDDEN_UNITS))

This implementation doesn't require layers to have any knowledge of each other or require a theano function to perform the task.

I've only tested it with a VGG style network, so there may be layers with parameters I have not accounted for.

bottler commented 9 years ago

I'm sorry I've gone quiet here. I've been busy with other things. My current solution - https://github.com/bottler/keras - works for layers with arbitrary shapes, and the code for dealing with each layer lives in the layer, which I think makes it more convenient.

fchollet re convolutions: That's why I recommend using SpecifyShape as the first layer of a sequence. If the network is such the flexibility in the input size is desirable - for example increasing width by 1 will sometimes leave the output of MaxPooling unchanged - then the user can just lie in SpecifyShape.

I have changed the syntax of SpecifyShape to match Reshape, so that the example I gave above would be model.add(SpecifyShape(20)) not model.add(SpecifyShape([20])).

I have now added documentation including an example. I think I am ready to make a PR.