Closed meanmee closed 7 years ago
I believe you have to connect the input with the output. Anyway, I've already tried to implement it and I failed dramatically.
My code: https://github.com/Sebubu/mushroom_crawler/blob/master/mushroom/ResidualNet.py
Thanks very much! Do you test it on cifar-10?
No, like I said above it didn't work so far. Feel free to experiment with the code.
I think the shortcut is not a Convolutional layer, but a linear layer. In your code you modified it as convolutional. Now i want to replace it with a linear layer ,what code should I modified? Sorry , I am not familiar with keras.
Oh, I think i know how to do it
paper proposes two options: either linear or convolution and believe me a convolution is easier. Otherwise you have to handle with reshaping and so on...
@Sebubu @fchollet It seems like ZeroPadding2D AveragePooling2D or BatchNor can not add to a graph as a node. I modified your code, The Graph can be generated but can not work when applied to real cifar-10 data: ———————————————— The code I modified:cifar10.txt ———————————————— and my test code is: —————————————————————— input_shapes = (3,32,32) print('32-layers')
model = Sequential() model.add(create_31_layer(input_shapes)) model.add(AveragePooling2D(pool_size=(8,8))) model.add(Flatten()) model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
X_train = X_train.astype('float32') X_test = X_test.astype('float32')
print'Not using data augmentation or normalization' model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test,Y_test), show_accuracy=True) score = model.evaluate(X_test, Y_test, batch_size=batch_size) print'Test score:', score ———————————————————— which raise the error:
Traceback (most recent call last):
File "/home/dell/DLTest/cifar_test/Residul/residul_32layers.py", line 43, in
issue #1275 describes your exception. I have not found a solution till now.
Is it like building a wrapper that would enable implementing residual net by adding several residual block, i.e.
mode.add(residual())
model.add(residual())
? Then it would be so cool. I think the problem with BN and graph
was resolved.
I implemented a residual class based on the Regularizer class. Not sure if this is the correct way to do it or not, but it seems to work. Also slows down training to a crawl.
For the life of me this thing will not format right, so sorry for the awful formatting...
_CODE START_
from keras.layers.core import Layer from keras.regularizers import Regularizer
class ResidualRegularizer(Regularizer): def init(self): pass
def set_layer(self, layer):
self.layer = layer
#When asked for the loss, just return 0 to prevent back-prop to the previous layers
def __call__(self, loss):
return 0
def get_config(self):
return {"name": self.__class__.__name__}
class Residual(Layer): """ Layer that passes through its input unchanged, and applies no back propagation. It is simply a forward-propagated link intended for residual linking. """ def init(self, kwargs): super(Residual, self).init(kwargs) residual_regularizer = ResidualRegularizer() residual_regularizer.set_layer(self) self.regularizers = [residual_regularizer]
def get_output(self, train=False):
return self.get_input(train)
def get_config(self):
config = {"name": self.__class__.__name__}
base_config = super(Residual, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
_CODE END_
Then you implement it like... model.add_node(Dense(1536, activation='relu'), merge_mode='concat', concat_axis=-1, name='class_dense1', inputs=['flatten_embed','flatten']) model.add_node(Dense(1536, activation='relu'), name='class_dense2', input='class_dense1') model.add_node(Dense(1536, activation='relu'), name='class_dense3', input='class_dense2') model.add_node(Residual(), name='class_residual', input='class_dense1') model.add_node(Dense(vocab_size, activation='softmax'), name='class_softmax', merge_mode='sum', inputs=['class_residual','class_dense3'])
Again sorry for the terrible formatting...
It's not so much a residual layer as it is just a way to grab a previous layers outputs without back propagating. Then you can take that output and merge it, creating your 'residual'.
Also note this method only works if all the layers are the same size.
Has anyone given this a shot? https://github.com/ndronen/modeling/blob/master/modeling/residual.py
I've seen that code but I don't understand how it's supposed to be used. It looks like it's building the entire network, so it may be more of a factory? But, maybe we can use the Identity class in that code as a replacement for the Residual class in my example. Could be worth a shot if it's any faster. I'll give it a try and report back. Thanks sergeyf!
I think you can just use the blocks it returns in other models. So:
s = Sequential()
s.add( build_residual_block('resblock1', (100,), 2, n_skip=2) )
s.add( build_residual_block('resblock2', (100,), 2, n_skip=2) )
etc.
Ah, ok, I see it now. Thanks for the example! :)
Worth to check out this comment, I think the code @sergeyf mentioned has the same mistake. ReLU() should be applied after merge I guess - according to the comment. What do you think?
https://github.com/keunwoochoi/residual_block_keras I opened a repo for my residual block implemented in Keras. I put many comments in the files so it would be easy to understand. However I'm not sure if it's correct.
@keunwoochoi I think your observation is correct. Additionally, I walked myself through the logic of your code and it seems to be correct (albeit I haven't used it, that's for tomorrow). I did have a couple questions:
Thanks, @cmishra.
(kernel_row-1)/2
. I'll update it. stride > kernel size
. I chose MP than increasing the stride as in my case stride > kernel size
for some layers.I'll also update line58 to line81 to make it clear (the shortcut convolution part). (EDIT: I updated it)
(PS. At #1910 I added an issue for it but here people can get notifications.)
Now my networks seems to start to learning something meaningful features - not sure yet though. I once added usual classifier - flatten() - maxout() - BN - maxout ()..
but now I change it to an almost fully convolution architecture as in the original ResNet paper, removed all dropouts, change MaxPooling
to AveragePooling
in the shortcut path. Still not sure which is critical and which is not, but worth noting. I'll update more.
@Sebubu with #1387 does your code work? I'd try to run it myself, but I'm traveling and my personal laptop doesn't have the processing.
@keunwoochoi , let's take the discussion regarding your code to the other issue you posted.
There's a more recent (apparently) improved version of the residual block:
I recently implemented it in Keras using the new functional API: https://github.com/raghakot/keras-resnet
Great, I also updated my residual network implementation with Keras 1.0 API and the author's new paper that @sergeyf mentioned: https://github.com/keunwoochoi/residual_block_keras .
keunwoochoi, I try to use your residual_block_keras (https://github.com/keunwoochoi/residual_block_keras), but I encountered the following error
Traceback (most recent call last):
File "example.py", line 151, in <module>
model = get_residual_model()
File "example.py", line 120, in get_residual_model
residual_blocks = design_for_residual_blocks(num_channel_input=128)
File "example.py", line 100, in design_for_residual_blocks
subsample=pool_sizes[conv_idx]
File "/usr/local/lib/python2.7/site-packages/keras/models.py", line 145, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 485, in __call__
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 543, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 148, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 1922, in call
output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2064, in run_internal_graph
output_tensors = to_list(layer.call(computed_tensor, computed_mask))
File "/usr/local/lib/python2.7/site-packages/keras/layers/normalization.py", line 116, in call
raise Exception('You are attempting to share a '
Exception: You are attempting to share a same `BatchNormalization` layer across different data flows. This is not possible. You should use `mode=2` in `BatchNormalization`, which has a similar behavior but is shareable (see docs for a description of the behaviour).
Do you have any suggestion?
Hi @codingneo, it was fixed yesterday, https://github.com/keunwoochoi/residual_block_keras/commit/a35fe6fb8e356cea8a3e491d24beefc8a996ce05. Pull the repo again and give it a try!
Hi @keunwoochoi I actually did similar fix as your new changes by using mode=2 for BatchNormalization layer. But with mode=2 in BatchNormalization layer, the training procedure will generate NaN loss as follows
Train on 60000 samples, validate on 10000 samples Epoch 1/20 1088/60000 [..............................] - ETA: 13519s - loss: nan - acc: 0.1158 1152/60000 [..............................] - ETA: 13522s - loss: nan - acc: 0.1137
Is it something need to be cautious?
Is it the result of running example.py
?
With Theano backend it’s working well. Had an error with TensorFlow though.
Epoch 1/20
5760/60000 [=>............................] - ETA: 349s - loss: 0.5634 - acc: 0.8210
On 13Jun 2016, at 13:40, Yiqun Hu notifications@github.com wrote:
Hi @keunwoochoi https://github.com/keunwoochoi I actually did similar fix as your new changes by using mode=2 for BatchNormalization layer. But with mode=2 in BatchNormalization layer, the training procedure will generate NaN loss as follows
Train on 60000 samples, validate on 10000 samples Epoch 1/20 1088/60000 [..............................] - ETA: 13519s - loss: nan - acc: 0.1158 1152/60000 [..............................] - ETA: 13522s - loss: nan - acc: 0.1137
Is it something need to be cautious?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/1321#issuecomment-225569315, or mute the thread https://github.com/notifications/unsubscribe/APZ8xUTLs7Gb6O6lFVrn0HE3StTVGN4sks5qLU-0gaJpZM4G5hHh.
@keunwoochoi Yes, the result is getting from running example.py. I am in mac using then backend.
For other googlers like me - there IS now a way to do residual connection in keras - https://keras.io/getting-started/functional-api-guide/
from keras.layers import Conv2D, Input
# input tensor for a 3-channel 256x256 image
x = Input(shape=(256, 256, 3))
# 3x3 conv with 3 output channels (same as input channels)
y = Conv2D(3, (3, 3), padding='same')(x)
# this returns x + y.
z = keras.layers.add([x, y])
x,y being two consecutive layers
But how would i add it to my model? (Sorry im pretty new to Keras)
model.add(z)
Doesn't work, obviously
def build_residual_block_conv(num_filters, name, input_shape, input_name='x'): """ Rough sketch of building blocks of layers for residual learning. See http://arxiv.org/abs/1512.03385 for motivation. """ block = Graph()