microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

How to run GoogLeNet on CNTK? #246

Closed xhzhao closed 7 years ago

xhzhao commented 8 years ago

I need to run GoogLeNet on CNTK, while there is no such examples. I tried to write my own config and ndl file, but i encountered a problem: How to do "DepthConcat" in CNTK? image

The example code in Torch is on this link:https://github.com/soumith/convnet-benchmarks/blob/master/torch7/imagenet_winners/googlenet.lua Torch use the operator "Concat" to do this. Is there any similar operator in CNTK? How to use it?

frankseide commented 8 years ago

Would RowStack work for you?

// RowStack (input0, input1, ...)
// stacks multiple inputs on top of each other
// The inputs will be spliced w.r.t. their first tensor dimension (the "row" dimension).

Let's say you have 4 2D tensors, each of size [w_i x h]. RowStack will give you one 2D tensor [(w_1+w_2+w_3+w_4) x h]. Currently, CNTK can only stack along the first dimension.

Note that "on top" here refers to a "matrix interpretation" of the tensor where the first dimension is the "tall" dimension. The "tall" dimension is the dimension with the fastest-changing index, the dimension with stride 1. For image tensors [W x H x C], the "tall" dimension is W, not H.

If this works for you except for the requirement of stacking along the first dimension, we can use TransposeDimension() to work this out.

// TransposeDimensions (input, dim1, dim2)
//  - swaps index dimensions dim1 and dim2. The values are 1-based; 1 stands for the leading dimension.
//  - new dimensions can be created; e.g. a column vector can be transposed into a row vector, which is a [1 x N] tensor
//  - transposing into the time dimension is currently not supported
//  - internally implemented with tensor lib by shuffling dimensions with their strides
//  - input may be minibatch data or not
// Transpose (input) = TransposeDimensions (input, 1, 2)

Alternatively, we could change the RowStack C++ code. The node code itself actually can already stack along arbitrary dimensions, and it would take maybe 10 lines of code to expose that to NDL. The reasons that it is not exposed presently is mostly because it would require a name change (since it's no longer stacking "rows"), so we'd have to deal with back compatibility; and we'd need test cases.

zhaoyue-zephyrus commented 8 years ago

@frankseide

Hi Frank,

Just want to use the TransposeDimensions() not for purpose of GoogLeNet, though. But I got a "undefined function or macro" error. It seems that it is commented out in CNTK/Source/ActionsLib/NetworkDescriptionLanguage.cpp, isn't it? But if I uncomment it and make again, a segmentation fault arises (I write something like "input_transpose = TransposeDimensions (input, 2,3)" ).

Any advice? Thanks!

wangxianliang commented 8 years ago

Hi, xhzhao,

Did you successfully run GoogLeNet on CNTK? Thanks a lot!

ldmtwo commented 8 years ago

If anyone has GoogLeNet working, I would like to try it out as well. Thanks.

xhzhao commented 8 years ago

@wangxianliang no, i tried to use TransposeDimensions(),but failed.

pinzhang10 commented 8 years ago

Does anybody know how to do the tranpose in NDL? When I am using ArrayTransposeDimensions, it says "EXCEPTION occurred: Undefined function or macro 'ArrayTransposeDimensions'" The same for TransposeDimensions.

Please help!

frankseide commented 8 years ago

You must use BrainScriptNetworkBuilder, cf. #576.

mahilleb-msft commented 7 years ago

Please checkout https://www.microsoft.com/en-us/cognitive-toolkit/features/model-gallery/?filter=Image, which has GoogLeNet. Please file a new issue if you run into problems. Thanks!