pluskid / Mocha.jl

Deep Learning framework for Julia
Other
1.29k stars 254 forks source link

Discussion: About sparse connections #189

Closed freddycct closed 6 years ago

freddycct commented 8 years ago

I have some questions about the implementation of LeNet5 as provided in the example.

Quoting from Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, no.11, pp.2278-2324, Nov 1998. In the last paragraph of section II.B (page 7).

Layer C3 is a convolutional layer with 16 feature maps. Each unit in each feature map is connected to several 5x5 neighborhoods at identical locations in a subset of S2's feature maps.

S2 refers to this line of code pool_layer = PoolingLayer(name="pool1", kernel=(2,2), stride=(2,2), bottoms=[:conv], tops=[:pool]) C3 refers to this line of code conv2_layer = ConvolutionLayer(name="conv2", n_filter=50, kernel=(5,5), bottoms=[:pool], tops=[:conv2])

The connections between C3 and S2 are not fully connected but "manually" coded to be sparse in the original implementation. So how does it get implemented in the example code? Or is it left out due to the fact that it is hard to have such a feature in Mocha?

pluskid commented 8 years ago

What kind of "manually coded" sparse connection is it? An easier way of implementing this (if a bit overhead in computation is not a big issue) is to use an extra 0-1 mask to point-wise multiple the weights to turn of some of the connections. Mocha does not support sparse connection, so you will need to write your own layers.

freddycct commented 8 years ago

I was talking about Table 1 on page 8 of this paper. http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf