microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.53k stars 4.28k forks source link

Segmentation Fault CNTK with Python 3.6 #2002

Open antspy opened 7 years ago

antspy commented 7 years ago

Hello,

I am using CNTK 2 with python 3.6 (with anaconda 4.4) on linux (Ubuntu 16.04). I tried implementing a custom function (as described here: https://www.cntk.ai/pythondocs/extend.html ) to modify layers in a model; in particular, this function does not alter the inputs or ourputs of the model.The modified models work on a simple MNIST model. On the more complicated CMU seq2seq model (the one described here: https://www.microsoft.com/en-us/cognitive-toolkit/blog/2016/11/sequence-to-sequence-deep-recurrent-neural-networks-in-cntk-part-1/ using the python code here : https://github.com/Microsoft/CNTK/tree/master/Examples/SequenceToSequence/CMUDict/Python), the model is created (I think correctly) but when trying to evaluate it, I get a segfault. In particular the code crashes at cntk.ops.functions.Function.foward, at the line "state = super(Function, self)._forward(in_var_map, output_map, device, keep_for_backward)"

Note that the crash only happens with the modified model, the original model works fine. All the variables "in_var_map, output_map, device, keep_for_backward" are the same in both cases. Note also that the .forward method in the custom function I created is never reached; the crash happens somewhere before, which it's weird. The exact error message is: "bash: line 1: 25400 Segmentation fault (core dumped) env -i "PYTHONIOENCODING"="UTF-8" "LIBRARY_ROOTS"="............" (a bunch of paths and nothing more)

Note that it also happens in windows 10; though there I only see "the process terminated with code 255".

I tried to increase the stack size (as suggested in some other Issue) with "resource.setrlimit(resource.RLIMIT_STACK, (resource.RLIM_INFINITY, -1))" but it does not make any difference. Any thoughts? Thank you!

eldakms commented 7 years ago

Could you please share the code of your user defined function if it is not private? thanks!

antspy commented 7 years ago

Of course!

What I would like to achieve is a layer that can be inserted anywhere, receives a certain number of inputs, and returns the same number of outputs in the same shape, with the numerical value changed (how to exactly change the numerical values is not important). If I use the code below after (or before) a Dense layer, everything works properly: if I use it in combination with a LSTM cell, it raises a Segmentation Fault. In particular, is I do this

C.layers.LSTM(hidden_dim) >> functionNewLayer(numInputs=2)¨

it appears to work, the inputs and outputs dimension are the same as a normal LSTM layer and CNTK does not complain. If I then create a layer with

C.layers.Recurrence(C.layers.LSTM(hidden_dim) >> functionNewLayer(numInputs=2))

then it will raise a segmentation fault at evaluation time. This is the code for the function I defined:

import numpy as np
from cntk.ops.functions import UserFunction
from cntk import output_variable
import cntk as C

class _NewLayer(UserFunction):
    def __init__(self, *args, option1=1, name='NewLayer'):
        super(_NewLayer, self).__init__(list(args), name=name)
        self.option1 = option1
        self.numInput = len(args)

    def forward(self, arguments, device=None, as_numpy=True):
        # The objective of this layer is to modify the arguments without changing the shape
        # the actual operations performed are not essential
        return None, arguments

    def backward(self, state, root_gradients, variables=None, as_numpy=True):
        return root_gradients

    def infer_outputs(self):
        #shape, type and dynamic axes of inputs are not changed by this function
        outputVar = [output_variable(self.inputs[idx].shape, self.inputs[idx].dtype,
            self.inputs[idx].dynamic_axes, name='out_newLayer') for idx in range(len(self.inputs))]
        return outputVar

    def serialize(self):
        return {'option1': self.option1}

    @staticmethod
    def deserialize(inputs, name, state):
        return _NewLayer(inputs, option1=state['option1'], name=name)

def functionNewLayer(numInput=1, option1=1, name='NewLayer'):

    'C.ops.Function only supports fixed signature, so we must specify how many input we require'
    #Actually not sure about this..

    @C.layers.blocks.BlockFunction('NewLayer1', name)
    def newFun1(x):
        return C.user_function(_NewLayer(x, option1=option1, name=name))

    @C.layers.blocks.BlockFunction('NewLayer2', name)
    def newFun2(x, y):
        return C.user_function(_NewLayer(x, y, option1=option1, name=name))

    @C.layers.blocks.BlockFunction('NewLayer3', name)
    def newFun3(x,y,z):
        return C.user_function(_NewLayer(x, y, z, option1=option1, name=name))

    functions = [newFun1, newFun2, newFun3]

    if numInput == 0 or numInput > len(functions):
        raise ValueError('numInput must be between 1 and {:d}'.format(len(functions)))

    return functions[numInput-1]
eldakms commented 7 years ago

I would appreciate if you could share the whole repro so that I can run/debug (if not publicly, please send it to eldak at microsoft com). Thanks a lot!

antspy commented 7 years ago

Hello,

here you go! https://github.com/antspy/cntkError I copied the code from Tutorial 6 Part A (https://github.com/Microsoft/CNTK/blob/v2.0/Tutorials/CNTK_106A_LSTM_Timeseries_with_Simulated_Data.ipynb)

The only thing I changed was to combine the new layer with the LSTM layer in the createModel function. Sometimes I get RuntimeError, and other times a Segmentation Fault. Just today I got a segmentation fault, but successive runs resulted in a "RuntimeError: No computation node mapping exists for Variable Output('out_newLayer', [#, *], [5])."

ivrodr-msft commented 7 years ago

This is similar to issue #2167. By design, user defined functions cannot be part of the CNTK recurrent loop. However, I will address the segmentation fault and display a proper error. Thanks for reporting!

cha-zhang commented 7 years ago

@ivrodr-msft @eldakms Has this been resolved?