Closed fchollet closed 9 years ago
For reference, here's the Torch way to do these things. I find it not very intuitive and not very elegant.
What about models that branch into more than 2 branches and multiple branch points? For example, the GoogLenet model with multiple intermediate classifiers (http://arxiv.org/pdf/1409.4842v1.pdf) and more generally, models with multiple tasks.
In this proposal, the Merge container takes as input a list of models, which can be arbitrarily long. The Fork container takes as input a n parameter, which determines how many branches to create. And the entire thing is recursive, so arbitrarily complex graphs can be created.
That said, I'm still not sure whether this proposal is the right thing to implement, mostly in terms of usability.
Actually, I have faced many multi-branch model to finish a certain task, especially some relevance task or the tasks that we should model two sentences or documents separately. So, I think it's actually very useful for experiments and I have just try to find the interface in Keras. I'm a NLPer. Look forward to the implements.
@jasonwbw I'm curious, what where you using previously? Do you think the above API would work for your workflow? What would you do differently?
Yes, for example, I'm trying to implement the Memory Networks now, this is a multi-arc. And many relevance task should use multi-branch arc like paper. I found in RoudMap that you will implement the Merge on the v1.0 version?
After much prototyping, it turned out that the overlap between the Sequential and Merge models was way too large to justify two separate models. It seems more sensical to confine the role of Merge to that of a layer that takes a list of inputs and returns a single output. Likewise for forks. And to extend Sequential to be a bit more general, as a result.
To make this merge-as-layer confinement possible, Sequential had to be modified to support list inputs and list targets. get_weights
and set_weights
might have to be modified as well (unclear yet).
Here's how the new API works like. Quasi-identical as the initial proposal, except Merge is now a layer. Another difference is that inputs and targets are always flat lists, not nested lists.
left = Sequential()
left.add(Dense(784, 50))
left.add(Activation('relu'))
right = Sequential()
right.add(Dense(784, 50))
right.add(Activation('relu'))
model = Sequential()
model.add(Merge([left, right], mode='sum'))
model.add(Dense(50, 10))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit([X_train, X_train], Y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=0, validation_data=([X_test, X_test], Y_test))
The above is already working in a test branch. It does support recursive model composition.
For now I have only implemented support for list inputs; not list targets yet. We'll add the Fork layer a bit later (in case any API issues crop up with Merge, we might still change the architecture a bit, so let's opt for a progressive release strategy).
Looks great. I think this actually works in practice. The non-sequential models I've built and seen so far can be easily replicated using this.
Just a thought experiment - would implementing Fork and Merge at a layer level rather than at a model level be better?
From an architecture standpoint, it would certainly be worse. From a UX standpoint, maybe it would be incrementally better (one line removed in the above example). But it would still be essentially the same UX.
In general, I consider that clean architecture is part of a good UX, since Keras users are very much supposed to look under the hood and extend the library to do whatever they need to. So having merge/fork at layer level does seem like the better choice.
It's on master now. Docs and tests included.
This a great job! Starting to have a try.
Hi, @fchollet , I'm trying to implement the dot function for Merge. I think it's very useful for implement the tensor neural network or any other similar models. But it's not a good design for the multi-branch (more than 2) architecture.
Do you have good suggestion or you think the method like this should be implemented by users.
Thank you!
Would also be nice if you could just train individual branches of Fork to manage biasing towards different parts of the datasets.
That can be done regardless of implementation as it is a feature provided by keras layers.Just train use get_weights and set_weights.
On Thu, May 21, 2015 at 3:02 PM, lemuriandezapada notifications@github.com wrote:
Would also be nice if you could just train individual branches of Fork to manage biasing towards different parts of the datasets.
Reply to this email directly or view it on GitHub: https://github.com/fchollet/keras/issues/104#issuecomment-104198932
I think having Caffe like names for layers would be better from a UX perspective. Having a dictionary that maps names to layer nodes would be sufficient I guess. We would just have to change the 'connect' method in layers to use the dictionary value of the key, rather than directly the node.
@pranv do you mean
model.add('dense', 10, 5)
as opposed to
model.add(Dense(10, 5))
I'm not sure I agree with this. In fact I'd much more prefer that the APIs take in instances as opposed to strings. Eg.
model.compile(binary_crossentropy)
as opposed to
model.compile('binary_crossentropy')
The former avoids introducing additional constructs (like strings), and benefits from IDE features like autocompletions, go-to-definitions and such.
Moreover, if you wanted to define a custom layer or loss function, you can do so without modifying the Keras source code. You need only define your own layer/obejctive that adheres to the Keras interface and pass in the instance/function into the API.
@vzhong From what @pranv says I imagined something like usr-specified name.
model.add(Convolutional(32, 3, 5, 5, name='conv1'))
model.add(Convolutional(32, 32, 5, 5, name='conv2'))
model.add(flattern())
model.add(Dense(10, 5, name='dense1'))
And I think this is a good idea, if it is also reflected to save/load function. Like, I have a trained weight of Convolutional NN with classification layers on top of it, and saved it into a file, and later I want to create a model just convolution layers.
another_model.add(Convolutional(32, 3, 5, 5, name='new_conv1'))
another_model.add(Convolutional(32, 32, 5, 5, name='new_conv2'))
another_model.load_weights('filepath', new_conv1='conv1', new_conv2='conv2')
# Use 'conv1' and 'conv2' weights from original model to 'new_conv1' and 'new_conv2' layers respectively
It is very convenient, I want to concatenate 2 branches convolutional layers' feature maps, the code like this: model1 = Sequential() model1.add(Convolution2D(2, 1, 5, 5, border_mode='valid')) model1.add(Activation('tanh'))
model1.add(Convolution2D(4, 2, 3, 3, border_mode='valid')) model1.add(Activation('tanh')) model1.add(MaxPooling2D(poolsize=(2, 2))) # get feature maps(num = 4,size = 11*11)
model2 = Sequential() model2.add(Convolution2D(4, 1, 7, 7, border_mode='valid')) model2.add(Activation('tanh')) model2.add(MaxPooling2D(poolsize=(2, 2)))# get feature maps(num = 4,size = 11*11)
model = Sequential() model.add(Merge([model1, model2], mode = 'concat')) # concatenate feature maps(num = 8,size = 11*11)
model.add(Convolution2D(16, 8, 3, 3, border_mode='valid')) model.add(Activation('tanh')) model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Flatten()) model.add(Dense(16_4_4, 128, init='normal')) model.add(Activation('tanh'))
model.add(Dense(128, 10, init='normal')) model.add(Activation('softmax'))
sgd = SGD(l2=0.0,lr=0.05, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer='sgd',class_mode="categorical") model.fit([data,data], label, batch_size=100, nb_epoch=10,shuffle=True,verbose=1,show_accuracy=True,validation_split=0.2)
It can't work well. If I use the 'sum', no problem, like this: .... model = Sequential() model.add(Merge([model1, model2], mode = 'sum'))
model.add(Convolution2D(16, 4, 3, 3, border_mode='valid')) model.add(Activation('tanh')) model.add(MaxPooling2D(poolsize=(2, 2))) ....
Let me know what you think. What is this missing? Any way to make it simpler and more elegant?