How to implement a Residul block via keras?

meanmee commented 8 years ago

def build_residual_block_conv(num_filters, name, input_shape, input_name='x'): """ Rough sketch of building blocks of layers for residual learning. See http://arxiv.org/abs/1512.03385 for motivation. """ block = Graph()

block.add_input(input_name, input_shape=input_shape)

h1 = Convolution2D(num_filters, 3, 3,activation='relu', border_mode='same')
block.add_node(h1, name=name+'h1', input=input_name)
block.add_node(Dropout(0.25), name=name+'d1', input=name+'h1')

h2 = Convolution2D(num_filters, 3, 3,activation='linear', border_mode='same')
block.add_node(h2, name=name+'h2', input=name+'d1')

block.add_output(name=name+'output', inputs=[name+'h1', name+'h2'], merge_mode='sum')

return block

SeverinAlexB commented 8 years ago

I believe you have to connect the input with the output. Anyway, I've already tried to implement it and I failed dramatically.

Graph has a curious issue with adding several graphs in another graph #1275
After adding more than 50 layers to a graph, keras gets really slow (not just the compilation). I guess add_node calculates the output_shape recursively and that kills the performance.

My code: https://github.com/Sebubu/mushroom_crawler/blob/master/mushroom/ResidualNet.py

meanmee commented 8 years ago

Thanks very much！ Do you test it on cifar-10？

SeverinAlexB commented 8 years ago

No, like I said above it didn't work so far. Feel free to experiment with the code.

meanmee commented 8 years ago

I think the shortcut is not a Convolutional layer, but a linear layer. In your code you modified it as convolutional. Now i want to replace it with a linear layer ,what code should I modified? Sorry , I am not familiar with keras.

Oh, I think i know how to do it

SeverinAlexB commented 8 years ago

paper proposes two options: either linear or convolution and believe me a convolution is easier. Otherwise you have to handle with reshaping and so on...

meanmee commented 8 years ago

@Sebubu @fchollet It seems like ZeroPadding2D AveragePooling2D or BatchNor can not add to a graph as a node. I modified your code, The Graph can be generated but can not work when applied to real cifar-10 data: ———————————————— The code I modified:cifar10.txt ———————————————— and my test code is: —————————————————————— input_shapes = (3,32,32) print('32-layers')

model = Sequential() model.add(create_31_layer(input_shapes)) model.add(AveragePooling2D(pool_size=(8,8))) model.add(Flatten()) model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

X_train = X_train.astype('float32') X_test = X_test.astype('float32')

print'Not using data augmentation or normalization' model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test,Y_test), show_accuracy=True) score = model.evaluate(X_test, Y_test, batch_size=batch_size) print'Test score:', score ———————————————————— which raise the error:

Traceback (most recent call last): File "/home/dell/DLTest/cifar_test/Residul/residul_32layers.py", line 43, in model.compile(loss='categorical_crossentropy', optimizer='adam') File "build/bdist.linux-x86_64/egg/keras/models.py", line 406, in compile File "build/bdist.linux-x86_64/egg/keras/layers/containers.py", line 128, in get_output File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 896, in get_output File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 159, in get_input .........................(many like above) AssertionError

SeverinAlexB commented 8 years ago

issue #1275 describes your exception. I have not found a solution till now.

keunwoochoi commented 8 years ago

Is it like building a wrapper that would enable implementing residual net by adding several residual block, i.e.

mode.add(residual())
model.add(residual())

? Then it would be so cool. I think the problem with BN and graph was resolved.

courageon commented 8 years ago

I implemented a residual class based on the Regularizer class. Not sure if this is the correct way to do it or not, but it seems to work. Also slows down training to a crawl.

For the life of me this thing will not format right, so sorry for the awful formatting...

_CODE START_

from keras.layers.core import Layer from keras.regularizers import Regularizer

class ResidualRegularizer(Regularizer): def init(self): pass

def set_layer(self, layer):
    self.layer = layer

#When asked for the loss, just return 0 to prevent back-prop to the previous layers
def __call__(self, loss):
    return 0

def get_config(self):
    return {"name": self.__class__.__name__}

class Residual(Layer): """ Layer that passes through its input unchanged, and applies no back propagation. It is simply a forward-propagated link intended for residual linking. """ def init(self, kwargs): super(Residual, self).init(kwargs) residual_regularizer = ResidualRegularizer() residual_regularizer.set_layer(self) self.regularizers = [residual_regularizer]

def get_output(self, train=False):
    return self.get_input(train)

def get_config(self):
    config = {"name": self.__class__.__name__}
    base_config = super(Residual, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

_CODE END_

Then you implement it like... model.add_node(Dense(1536, activation='relu'), merge_mode='concat', concat_axis=-1, name='class_dense1', inputs=['flatten_embed','flatten']) model.add_node(Dense(1536, activation='relu'), name='class_dense2', input='class_dense1') model.add_node(Dense(1536, activation='relu'), name='class_dense3', input='class_dense2') model.add_node(Residual(), name='class_residual', input='class_dense1') model.add_node(Dense(vocab_size, activation='softmax'), name='class_softmax', merge_mode='sum', inputs=['class_residual','class_dense3'])

Again sorry for the terrible formatting...

It's not so much a residual layer as it is just a way to grab a previous layers outputs without back propagating. Then you can take that output and merge it, creating your 'residual'.

Also note this method only works if all the layers are the same size.

sergeyf commented 8 years ago

Has anyone given this a shot? https://github.com/ndronen/modeling/blob/master/modeling/residual.py

courageon commented 8 years ago

I've seen that code but I don't understand how it's supposed to be used. It looks like it's building the entire network, so it may be more of a factory? But, maybe we can use the Identity class in that code as a replacement for the Residual class in my example. Could be worth a shot if it's any faster. I'll give it a try and report back. Thanks sergeyf!

sergeyf commented 8 years ago

I think you can just use the blocks it returns in other models. So:

s = Sequential()
s.add( build_residual_block('resblock1', (100,), 2, n_skip=2) )
s.add( build_residual_block('resblock2', (100,), 2, n_skip=2) )

etc.

courageon commented 8 years ago

Ah, ok, I see it now. Thanks for the example! :)

keunwoochoi commented 8 years ago

Worth to check out this comment, I think the code @sergeyf mentioned has the same mistake. ReLU() should be applied after merge I guess - according to the comment. What do you think?

keunwoochoi commented 8 years ago

https://github.com/keunwoochoi/residual_block_keras I opened a repo for my residual block implemented in Keras. I put many comments in the files so it would be easy to understand. However I'm not sure if it's correct.

cmishra commented 8 years ago

@keunwoochoi I think your observation is correct. Additionally, I walked myself through the logic of your code and it seems to be correct (albeit I haven't used it, that's for tomorrow). I did have a couple questions:

Your current implementation would only work if kernel_sizes = (3, 3) correct? I say this because of line 63.
Any reason you used max pooling for subsampling instead of increasing the stride? Admittedly it makes more sense to me then setting stride=2 on a (1, 1) filter, but I'm curious whether you have a source on this. Aside: I'd also be curious to see whether average pool would work better than max pooling and stride=2 since intuition would suggest that it would better resemble the original x.
Any reason you implement the main path in the residual block internally instead of accepting it as a parameter? This would simplify and make more robust the implementation. I see two potential thorns: automatically detecting necessary subsampling/dimension modifications, and deep copying a Keras layer. I'm fairly confident the first can be easily resolved. Not sure about the latter. Do Keras layers work with deepcopy?

keunwoochoi commented 8 years ago

Thanks, @cmishra.

Oh, yes, there's a mistake. It used to assume (3,3) input but now it should be (1,1) rather than (kernel_row-1)/2. I'll update it.
It seems a different story, but does (1,1) convolution with stride>1 really work? I'm not sure it's okay to do when the stride > kernel size. I chose MP than increasing the stride as in my case stride > kernel size for some layers.
Average pooling: true, I once implemented it but not tested yet. Your saying supports the idea as well, I'll try it soon after the networks seems to work.
main path - I think it's natural and convenient but haven't thought about separate the main path and the shortcut in different function. Wouldn't it be more complex to use?

I'll also update line58 to line81 to make it clear (the shortcut convolution part). (EDIT: I updated it)

(PS. At #1910 I added an issue for it but here people can get notifications.)

keunwoochoi commented 8 years ago

Now my networks seems to start to learning something meaningful features - not sure yet though. I once added usual classifier - flatten() - maxout() - BN - maxout ().. but now I change it to an almost fully convolution architecture as in the original ResNet paper, removed all dropouts, change MaxPooling to AveragePooling in the shortcut path. Still not sure which is critical and which is not, but worth noting. I'll update more.

cmishra commented 8 years ago

@Sebubu with #1387 does your code work? I'd try to run it myself, but I'm traveling and my personal laptop doesn't have the processing.

@keunwoochoi , let's take the discussion regarding your code to the other issue you posted.

sergeyf commented 8 years ago

There's a more recent (apparently) improved version of the residual block:

http://arxiv.org/pdf/1603.05027v1.pdf

raghakot commented 8 years ago

I recently implemented it in Keras using the new functional API: https://github.com/raghakot/keras-resnet

keunwoochoi commented 8 years ago

Great, I also updated my residual network implementation with Keras 1.0 API and the author's new paper that @sergeyf mentioned: https://github.com/keunwoochoi/residual_block_keras .

codingneo commented 8 years ago

keunwoochoi, I try to use your residual_block_keras (https://github.com/keunwoochoi/residual_block_keras), but I encountered the following error

Traceback (most recent call last):
  File "example.py", line 151, in <module>
    model = get_residual_model()
  File "example.py", line 120, in get_residual_model
    residual_blocks = design_for_residual_blocks(num_channel_input=128)
  File "example.py", line 100, in design_for_residual_blocks
    subsample=pool_sizes[conv_idx]
  File "/usr/local/lib/python2.7/site-packages/keras/models.py", line 145, in add
    output_tensor = layer(self.outputs[0])
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 485, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 543, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 148, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 1922, in call
    output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2064, in run_internal_graph
    output_tensors = to_list(layer.call(computed_tensor, computed_mask))
  File "/usr/local/lib/python2.7/site-packages/keras/layers/normalization.py", line 116, in call
    raise Exception('You are attempting to share a '
Exception: You are attempting to share a same `BatchNormalization` layer across different data flows. This is not possible. You should use `mode=2` in `BatchNormalization`, which has a similar behavior but is shareable (see docs for a description of the behaviour).

Do you have any suggestion?

keunwoochoi commented 8 years ago

Hi @codingneo, it was fixed yesterday, https://github.com/keunwoochoi/residual_block_keras/commit/a35fe6fb8e356cea8a3e491d24beefc8a996ce05. Pull the repo again and give it a try!

codingneo commented 8 years ago

Hi @keunwoochoi I actually did similar fix as your new changes by using mode=2 for BatchNormalization layer. But with mode=2 in BatchNormalization layer, the training procedure will generate NaN loss as follows

Train on 60000 samples, validate on 10000 samples Epoch 1/20 1088/60000 [..............................] - ETA: 13519s - loss: nan - acc: 0.1158 1152/60000 [..............................] - ETA: 13522s - loss: nan - acc: 0.1137

Is it something need to be cautious?

keunwoochoi commented 8 years ago

Is it the result of running example.py? With Theano backend it’s working well. Had an error with TensorFlow though.

Epoch 1/20
 5760/60000 [=>............................] - ETA: 349s - loss: 0.5634 - acc: 0.8210

On 13Jun 2016, at 13:40, Yiqun Hu notifications@github.com wrote:

Hi @keunwoochoi https://github.com/keunwoochoi I actually did similar fix as your new changes by using mode=2 for BatchNormalization layer. But with mode=2 in BatchNormalization layer, the training procedure will generate NaN loss as follows

Train on 60000 samples, validate on 10000 samples Epoch 1/20 1088/60000 [..............................] - ETA: 13519s - loss: nan - acc: 0.1158 1152/60000 [..............................] - ETA: 13522s - loss: nan - acc: 0.1137

Is it something need to be cautious?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/1321#issuecomment-225569315, or mute the thread https://github.com/notifications/unsubscribe/APZ8xUTLs7Gb6O6lFVrn0HE3StTVGN4sks5qLU-0gaJpZM4G5hHh.

codingneo commented 8 years ago

@keunwoochoi Yes, the result is getting from running example.py. I am in mac using then backend.

Darthholi commented 7 years ago

For other googlers like me - there IS now a way to do residual connection in keras - https://keras.io/getting-started/functional-api-guide/

from keras.layers import Conv2D, Input

# input tensor for a 3-channel 256x256 image
x = Input(shape=(256, 256, 3))
# 3x3 conv with 3 output channels (same as input channels)
y = Conv2D(3, (3, 3), padding='same')(x)
# this returns x + y.
z = keras.layers.add([x, y])

x,y being two consecutive layers

MarviB16 commented 5 years ago

But how would i add it to my model? (Sorry im pretty new to Keras) model.add(z) Doesn't work, obviously

keras-team / keras

How to implement a Residul block via keras? #1321