Will not process xy - skipping this layer as irrelevant

h4gen commented 7 years ago

Hi There!

So far I am happy to have a working CNTK -> CortexM3 Pipeline. But I was wondering, why my results on the hardware differ so much from the test results in the Simulation in CNTK. I used the simple Logistic Regression Example and deployed it to my Hardware (Bosch XDK). When I saw, that the results differ, I recognised the message of the ELL compiler stating:

Selected CPU as the process wide default device.

Finished loading.
Pre-processing...

Will not process Times - skipping this layer as irrelevant.

Will not process Plus - skipping this layer as irrelevant.
Softmax :  1x1x2  ->  1x1x2 | padding  0

Finished pre-processing.

Constructing equivalent ELL layers from CNTK...
Converting layer  Softmax(Tensor[2]) -> Tensor[2]

Obviously leaving out the times and plus just leads to an evaluation of the input data by the softmax. How can it be, that these layers are marked as irrelevant? How can I influence it? Is this a bug or a feature?

Best, Hagen

lisaong commented 7 years ago

Hi, great to see progress made. These layers are not yet recognized by the CNTK importer, which focuses more on convolutional networks as a starting point.

For logistic regression, you can try updating tools/importers/CNTK/cntk_to_ell.py to handle the Times and Plus op_names for your model's needs.

We are in the process of documenting how to update the importer, and refactoring it to make it easier to update, but here are a few notes to get you started meanwhile:

You'd want to update get_filtered_layers_list() to recognize Times and Plus op_names, and convert_cntk_layers_to_ell_layers() to do the conversion
- Basically, the Times operation should be handled similarly to ElementTimes, and the Plus operation should be equivalent to adding a ELL Bias layer.
- Since the importer is written for convolutional networks in mind, you may need to write simple logic to map your rows and columns differently.
- The CNTK and ELL layer ordering differ for convolutional networks (CNTK does channels, rows, columns ordering, while ELL does rows, columns channels ordering).
Padding: get_input_padding_parameters_for_layer: Hopefully your model is simple enough to just assume no padding (the default).

h4gen commented 7 years ago

Okay thanks, I will try to fix it. But I am still a little bit puzzled. How does the neural network implementation work if there is no plus and times operation? I mean, basically a logistic regression does the same thing as a perceptron. I would assume that it does nearly the same as your linear layer in the Library. Is there maybe a possible workaround to create a model in cntk that does not need times and plus? Unfortunately I could not not find something like a linear layer in cntk.

I have one more question regarding the expected workflow with ELL: Right now I interpret it more as a compiler for cntk models. But basically I could just design the whole model with ELL, right? Is there a recommended workflow how to use ELL?

h4gen commented 7 years ago

Hi there!

I tried implementing the processing for the layers. I added this code:

def process_plus_layer(layer,ellLayers):
    biasParameter = findParameterByName(layer.parameters, 'b', 0)
    biasVector = get_float_vector_from_cntk_trainable_parameter(biasParameter)
    layerParameters = ELL.LayerParameters(layer.ell_outputShapeMinusPadding, ELL.NoPadding(
    ), layer.ell_outputShape, layer.ell_outputPaddingParameters)
    ellLayers.append(ELL.FloatBiasLayer(layerParameters, biasVector))
    return

def process_times_layer(layer,ellLayers):
    weightsParameter = findParameterByName(layer.parameters, 'W', 0)
    weightsTensor = get_float_tensor_from_cntk_dense_weight_parameter(weightsParameter)
    layerParameters = ELL.LayerParameters(layer.ell_inputShape, layer.ell_inputPaddingParameters, layer.ell_outputShapeMinusPadding, ELL.NoPadding())
    ellLayers.append(ELL.FloatFullyConnectedLayer(layerParameters, weightsTensor))
    return

and added to convert_cntk_layers_to_ell_layers

        elif (cntkLayer.op_name == 'Times'):
            process_times_layer(cntkLayer, ellLayers)
        elif (cntkLayer.op_name == 'Plus'):
            process_plus_layer(cntkLayer, ellLayers)

and so to get_filtered_layers_list

        elif ((currentLayer.op_name == 'Dense') or
                ...     
           (currentLayer.op_name == 'Times') or
           (currentLayer.op_name == 'Plus')

After this the ell pre processing and the compiling works fine! This is the output:

Finished loading.
Pre-processing...
Times :  1x1x2  ->  1x1x2 | padding  0
Plus :  1x1x2  ->  1x1x2 | padding  0
Softmax :  1x1x2  ->  1x1x2 | padding  0

Finished pre-processing.

Constructing equivalent ELL layers from CNTK...
Converting layer  Times(Tensor[2]) -> Tensor[2]
Converting layer  Plus(Tensor[2]) -> Tensor[2]
Converting layer  Softmax(Tensor[2]) -> Tensor[2]

...Finished constructing ELL layers.

compiling and cross compiling is done via:

compile -imap mymodel.map --header --ir 
llc-3.9 -mtriple=armv7m-unknown-none-eabi -march=thumb -mcpu=cortex-m3 
-mattr=+armv7-m,+v7 -float-abi=soft -filetype=obj mymodel.ll

Then I get this error, when injecting the code in my project and compile it:

In function `_Node__MatrixVectorMultiplyNode_float__in_4_2_out_2':
mymodel.ll:(.text+0xe4): undefined reference to `cblas_sgemv

This is the model I export from cntk (I added a plus layer in comparison to the original from the tutorial)

weight_param = C.parameter(shape=(input_dim, output_dim),name='W')
bias_param = C.parameter(shape=(output_dim),name='b')
C.softmax(C.plus(C.times(input_var, weight_param), bias_param))

Any ideas whats going wrong?

h4gen commented 7 years ago

Okay, immediately found my error. The compile command has to be called with --blas false, so that there are no function calls to BLAS.

lisaong commented 7 years ago

Glad you found the error and was able to implement the layers you need!

lisaong commented 7 years ago

As for your questions:

How does the neural network implementation work if there is no plus and times operation?

A: we do support ElementTimes and Bias. Some CNTK layers do similar operations (depending on whether you are calling a function or block), and the convolutional networks that we started on for vision happen not to be using Times and Plus. We're actually working on adding "Plus" in the next release because we've encountered other networks that need it. The choice of the CNTK layer is done at model design time, and we happened to be using ElementTimes instead of Times.

I have one more question regarding the expected workflow with ELL: Right now I interpret it more as a compiler for cntk models. But basically I could just design the whole model with ELL, right? Is there a recommended workflow how to use ELL?

A: You can design a model with ELL from the ground up, especially if the model is fairly simple like what you tried. However, since ELL doesn't support training (yet), it is often useful to author and train more complex models using CNTK, and then import that model into ELL. The "import into ELL" workflow also applies if you have trained existing models (e.g. darknet).

microsoft / ELL

Will not process xy - skipping this layer as irrelevant #71