keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.3k stars 19.38k forks source link

API thoughts on supporting non-sequential models #104

Closed fchollet closed 8 years ago

fchollet commented 9 years ago

Let me know what you think. What is this missing? Any way to make it simpler and more elegant?

''' Simple model with 2 branches merging into one

    left   right
      |      |
       \    /
        model
          |
'''

left = Sequential()
left.add(Dense(10, 64))
left.add(Dense(10, 64))

right = Sequential()
right.add(Dense(20, 64))

model = Merge([left, right], merge_mode="concat", concat_dim=-1)
model.add(Dense(128, 64))
model.compile(optimizer, objective)

model.fit([Xleft, Xright], y)

''' Recursivity of Merge structures

    left   right
      |      |
       \    /    
    intermediate
         |       far_right
          \       /
            model
              |
'''

left = Sequential()
left.add(Dense(10, 64))
left.add(Dense(10, 64))

right = Sequential()
right.add(Dense(20, 64))

intermediate = Merge([left, right], merge_mode="concat", concat_dim=-1)
intermediate.add(Dense(128, 128))

far_right = Sequential()
far_right.add(Embedding(10000, 128))

model = Merge([intermediate, far_right], merge_mode="sum")
model.add(Dense(128, 10))
model.compile(optimizer, objective)

model.fit([[Xleft, Xright], Xembed], y)

''' Simple model with one sequence branching into 2

            model
            /   \
       two_headed_model
            |   |
'''

model = Sequential()
model.add(Dense(10, 128))

two_headed_model = Fork(model, n=2)
two_headed_model.add(Dense(128, 64), position=0)
two_headed_model.add(Dense(128, 1), position=1)

two_headed_model.compile(optimizer, objective)

two_headed_model.fit(X, [y1, y2])

''' "Adding" models
'''

model = Sequential()
model.add(Dense(10, 128))

upper_section = Sequential()
upper_section.add(Dense(128, 256))

model.add(upper_section)
model.compile(optimizer, objective)
model.fit(X, y)
fchollet commented 9 years ago

For reference, here's the Torch way to do these things. I find it not very intuitive and not very elegant.

jisraeli commented 9 years ago

What about models that branch into more than 2 branches and multiple branch points? For example, the GoogLenet model with multiple intermediate classifiers (http://arxiv.org/pdf/1409.4842v1.pdf) and more generally, models with multiple tasks.

fchollet commented 9 years ago

In this proposal, the Merge container takes as input a list of models, which can be arbitrarily long. The Fork container takes as input a n parameter, which determines how many branches to create. And the entire thing is recursive, so arbitrarily complex graphs can be created.

That said, I'm still not sure whether this proposal is the right thing to implement, mostly in terms of usability.

jasonwbw commented 9 years ago

Actually, I have faced many multi-branch model to finish a certain task, especially some relevance task or the tasks that we should model two sentences or documents separately. So, I think it's actually very useful for experiments and I have just try to find the interface in Keras. I'm a NLPer. Look forward to the implements.

fchollet commented 9 years ago

@jasonwbw I'm curious, what where you using previously? Do you think the above API would work for your workflow? What would you do differently?

jasonwbw commented 9 years ago

Yes, for example, I'm trying to implement the Memory Networks now, this is a multi-arc. And many relevance task should use multi-branch arc like paper. I found in RoudMap that you will implement the Merge on the v1.0 version?

fchollet commented 9 years ago

After much prototyping, it turned out that the overlap between the Sequential and Merge models was way too large to justify two separate models. It seems more sensical to confine the role of Merge to that of a layer that takes a list of inputs and returns a single output. Likewise for forks. And to extend Sequential to be a bit more general, as a result.

To make this merge-as-layer confinement possible, Sequential had to be modified to support list inputs and list targets. get_weights and set_weights might have to be modified as well (unclear yet).

Here's how the new API works like. Quasi-identical as the initial proposal, except Merge is now a layer. Another difference is that inputs and targets are always flat lists, not nested lists.

left = Sequential()
left.add(Dense(784, 50))
left.add(Activation('relu'))

right = Sequential()
right.add(Dense(784, 50))
right.add(Activation('relu'))

model = Sequential()
model.add(Merge([left, right], mode='sum'))

model.add(Dense(50, 10))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

model.fit([X_train, X_train], Y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_data=([X_test, X_test], Y_test)) 

The above is already working in a test branch. It does support recursive model composition.

For now I have only implemented support for list inputs; not list targets yet. We'll add the Fork layer a bit later (in case any API issues crop up with Merge, we might still change the architecture a bit, so let's opt for a progressive release strategy).

pranv commented 9 years ago

Looks great. I think this actually works in practice. The non-sequential models I've built and seen so far can be easily replicated using this.

Just a thought experiment - would implementing Fork and Merge at a layer level rather than at a model level be better?

fchollet commented 9 years ago

From an architecture standpoint, it would certainly be worse. From a UX standpoint, maybe it would be incrementally better (one line removed in the above example). But it would still be essentially the same UX.

In general, I consider that clean architecture is part of a good UX, since Keras users are very much supposed to look under the hood and extend the library to do whatever they need to. So having merge/fork at layer level does seem like the better choice.

It's on master now. Docs and tests included.

jasonwbw commented 9 years ago

This a great job! Starting to have a try.

jasonwbw commented 9 years ago

Hi, @fchollet , I'm trying to implement the dot function for Merge. I think it's very useful for implement the tensor neural network or any other similar models. But it's not a good design for the multi-branch (more than 2) architecture.
Do you have good suggestion or you think the method like this should be implemented by users.
Thank you!

lemuriandezapada commented 9 years ago

Would also be nice if you could just train individual branches of Fork to manage biasing towards different parts of the datasets.

pranv commented 9 years ago

That can be done regardless of implementation as it is a feature provided by keras layers.Just train use get_weights and set_weights.

On Thu, May 21, 2015 at 3:02 PM, lemuriandezapada notifications@github.com wrote:

Would also be nice if you could just train individual branches of Fork to manage biasing towards different parts of the datasets.

Reply to this email directly or view it on GitHub: https://github.com/fchollet/keras/issues/104#issuecomment-104198932

pranv commented 9 years ago

I think having Caffe like names for layers would be better from a UX perspective. Having a dictionary that maps names to layer nodes would be sufficient I guess. We would just have to change the 'connect' method in layers to use the dictionary value of the key, rather than directly the node.

vzhong commented 9 years ago

@pranv do you mean

model.add('dense', 10, 5)

as opposed to

model.add(Dense(10, 5))

I'm not sure I agree with this. In fact I'd much more prefer that the APIs take in instances as opposed to strings. Eg.

model.compile(binary_crossentropy)

as opposed to

model.compile('binary_crossentropy')

The former avoids introducing additional constructs (like strings), and benefits from IDE features like autocompletions, go-to-definitions and such.

Moreover, if you wanted to define a custom layer or loss function, you can do so without modifying the Keras source code. You need only define your own layer/obejctive that adheres to the Keras interface and pass in the instance/function into the API.

mthrok commented 9 years ago

@vzhong From what @pranv says I imagined something like usr-specified name.

model.add(Convolutional(32, 3, 5, 5, name='conv1'))
model.add(Convolutional(32, 32, 5, 5, name='conv2'))
model.add(flattern())
model.add(Dense(10, 5, name='dense1'))

And I think this is a good idea, if it is also reflected to save/load function. Like, I have a trained weight of Convolutional NN with classification layers on top of it, and saved it into a file, and later I want to create a model just convolution layers.

another_model.add(Convolutional(32, 3, 5, 5, name='new_conv1'))
another_model.add(Convolutional(32, 32, 5, 5, name='new_conv2'))
another_model.load_weights('filepath', new_conv1='conv1', new_conv2='conv2')
# Use 'conv1' and 'conv2' weights from original model to 'new_conv1' and 'new_conv2' layers respectively 
pranv commented 9 years ago

https://github.com/fchollet/keras/pull/281

wzliwen0701 commented 9 years ago

It is very convenient, I want to concatenate 2 branches convolutional layers' feature maps, the code like this: model1 = Sequential() model1.add(Convolution2D(2, 1, 5, 5, border_mode='valid')) model1.add(Activation('tanh'))

model1.add(Convolution2D(4, 2, 3, 3, border_mode='valid')) model1.add(Activation('tanh')) model1.add(MaxPooling2D(poolsize=(2, 2))) # get feature maps(num = 4,size = 11*11)

model2 = Sequential() model2.add(Convolution2D(4, 1, 7, 7, border_mode='valid')) model2.add(Activation('tanh')) model2.add(MaxPooling2D(poolsize=(2, 2)))# get feature maps(num = 4,size = 11*11)

model = Sequential() model.add(Merge([model1, model2], mode = 'concat')) # concatenate feature maps(num = 8,size = 11*11)

model.add(Convolution2D(16, 8, 3, 3, border_mode='valid')) model.add(Activation('tanh')) model.add(MaxPooling2D(poolsize=(2, 2)))

model.add(Flatten()) model.add(Dense(16_4_4, 128, init='normal')) model.add(Activation('tanh'))

model.add(Dense(128, 10, init='normal')) model.add(Activation('softmax'))

sgd = SGD(l2=0.0,lr=0.05, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer='sgd',class_mode="categorical") model.fit([data,data], label, batch_size=100, nb_epoch=10,shuffle=True,verbose=1,show_accuracy=True,validation_split=0.2)

It can't work well. If I use the 'sum', no problem, like this: .... model = Sequential() model.add(Merge([model1, model2], mode = 'sum'))

model.add(Convolution2D(16, 4, 3, 3, border_mode='valid')) model.add(Activation('tanh')) model.add(MaxPooling2D(poolsize=(2, 2))) ....