microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.51k stars 4.28k forks source link

TypeError: in method 'StreamInformation_m_name_set', argument 2 of type 'std::wstring const &' #3508

Open Madrawn opened 5 years ago

Madrawn commented 5 years ago

I'm trying to setup a MinibatchSourceFromData but can't figure out this error.

My inputs definition:

#input variables:
axis_dsc = C.Axis.new_unique_dynamic_axis('axis_desc')
inputDescription = C.sequence.input_variable(inputDim, sequence_axis=axis_dsc)
axis_title = C.Axis.new_unique_dynamic_axis('axis_title')
inputTitle = C.sequence.input_variable(inputDim,sequence_axis=axis_title)
inputCategories = C.input_variable(catDim)
inputFsk = C.input_variable(fskDim)

And I try to define the MinibatchSource like this:

   # Assign the data fields to be read from the input
    print(type(data['description']))
    print(type(data['title']))
    print(type(data['fsk']))
    print(type(data['categories']))
    print(type(data['description'][0]))
    print(type(data['title'][0]))
    print(type(data['fsk'][0]))
    print(type(data['categories'][0]))
    print(data['description'][0].shape)
    print(data['title'][0].shape)
    print(data['fsk'][0].shape)
    print(data['categories'][0].shape)
    data_map={inputDescription: (data['description'],inputDescription._type), inputTitle: (data['title'], inputTitle._type),inputFsk:np.asarray(data['fsk']),inputCategories:np.asarray(data['categories'])} 
    import pdb; pdb.set_trace()
    minibatch_source = C.io.MinibatchSourceFromData(data_map, max_samples=len(data['fsk']))

The prints output this, maybe that helps:

<class 'list'>
<class 'list'>
<class 'list'>
<class 'list'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
(2702, 98)
(18, 98)
(5,)
(612,)
> <ipython-input-53-dcca98020f25>(55)train()
-> minibatch_source = C.io.MinibatchSourceFromData(data_map, max_samples=len(data['fsk']))

StackTrace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-a85e415e1853> in <module>()
      2 t = create_model(inputDescription,inputTitle)
      3 
----> 4 train(data,t)

<ipython-input-48-6897362718f2> in train(data, model)
     49     data_map={inputDescription: (data['description'],inputDescription._type), inputTitle: (data['title'], inputTitle._type),inputFsk:np.asarray(data['fsk']),inputCategories:np.asarray(data['categories'])}
     50     import pdb; pdb.set_trace()
---> 51     minibatch_source = C.io.MinibatchSourceFromData(data_map, max_samples=len(data['fsk']))
     52 
     53     C.train.training_session(

~\AppData\Roaming\Python\Python36\site-packages\cntk\io\__init__.py in __init__(self, data_streams, max_samples)
    674         self._total_num_samples = 0 # total count; once the limit is reached, we stop returning data
    675 
--> 676         super(MinibatchSourceFromData, self).__init__()
    677 
    678     @staticmethod

~\AppData\Roaming\Python\Python36\site-packages\cntk\io\__init__.py in __init__(self)
    422         super(UserMinibatchSource, self).__init__()
    423 
--> 424         streams = {si.m_name: si for si in self.stream_infos()}
    425         self.streams = Record(**streams)
    426 

~\AppData\Roaming\Python\Python36\site-packages\cntk\io\__init__.py in stream_infos(self)
    686         return [StreamInformation(name, i, ['dense', 'sparse'][getattr(self._types[name], 'is_sparse', False)], 
    687                                   self._types[name].dtype, self._types[name].shape)
--> 688                 for i, name in enumerate(self._data.keys())]
    689 
    690     def next_minibatch(self, num_samples, number_of_workers=1, worker_rank=0, device=None):

~\AppData\Roaming\Python\Python36\site-packages\cntk\io\__init__.py in <listcomp>(.0)
    686         return [StreamInformation(name, i, ['dense', 'sparse'][getattr(self._types[name], 'is_sparse', False)], 
    687                                   self._types[name].dtype, self._types[name].shape)
--> 688                 for i, name in enumerate(self._data.keys())]
    689 
    690     def next_minibatch(self, num_samples, number_of_workers=1, worker_rank=0, device=None):

~\AppData\Roaming\Python\Python36\site-packages\cntk\io\__init__.py in __init__(self, name, stream_id, storage_format, dtype, shape, defines_mb_size)
    401                  shape, defines_mb_size=False):
    402         super(StreamInformation, self).__init__()
--> 403         self.m_name = name
    404         self.m_id = stream_id
    405         self.m_storage_format = StreamInformation._storage[storage_format]

~\AppData\Roaming\Python\Python36\site-packages\cntk\cntk_py.py in <lambda>(self, name, value)
    822 class StreamInformation(_object):
    823     __swig_setmethods__ = {}
--> 824     __setattr__ = lambda self, name, value: _swig_setattr(self, StreamInformation, name, value)
    825     __swig_getmethods__ = {}
    826     __getattr__ = lambda self, name: _swig_getattr(self, StreamInformation, name)

~\AppData\Roaming\Python\Python36\site-packages\cntk\cntk_py.py in _swig_setattr(self, class_type, name, value)
     72 
     73 def _swig_setattr(self, class_type, name, value):
---> 74     return _swig_setattr_nondynamic(self, class_type, name, value, 0)
     75 
     76 

~\AppData\Roaming\Python\Python36\site-packages\cntk\cntk_py.py in _swig_setattr_nondynamic(self, class_type, name, value, static)
     61     method = class_type.__swig_setmethods__.get(name, None)
     62     if method:
---> 63         return method(self, value)
     64     if (not static):
     65         if _newclass:

TypeError: in method 'StreamInformation_m_name_set', argument 2 of type 'std::wstring const &'

I have no idea what the type 'std::wstring const &' even is. My inputs are all one-hot encoded. It sounds like it has problems setting the name of an input stream, but I tried giving all my input variables names via the "name='...'" argument but that did not help.

delzac commented 5 years ago

Hi,

maybe this will help?

Madrawn commented 5 years ago

Hi, I already read that page. I've tried

train_summary = loss.train((X_train_lr, Y_train_lr), parameter_learners=[learner],
                   callbacks=[progress_writer])

But I didn't manage to get it to work as my network has 2 inputs as sequence and 2 outputs and couldn't figure out in what way/order to feed those 4 variables. Feeding Sequences with NumPy didn't help either. So I chose to use a MiniBatchSource, as the docu recommends doing so and you just map the inputVariable to your data , avoiding the guessing game in what order to give the data. Which brings me back to the original problem.

I mean I probably could write my data out in the CTF-format and then read it back in with the CTFDeserializer, but I mean... MiniBatchSourceFromData should work as I already have all my data in numpy-arrays?

delzac commented 5 years ago

I see. Given that your data can fit in your ram completely. Why not use forgo using the cntk readers and the Function.train() API all together.

I personally perfer to instantiate Trainer and do trainer.train_minibatch({x: X, y: Y})

Example here.

Madrawn commented 5 years ago

I see. Given that your data can fit in your ram completely. Why not use forgo using the cntk readers and the Function.train() API all together.

I personally perfer to instantiate Trainer and do trainer.train_minibatch({x: X, y: Y})

Example here.

Thanks! That worked after a bit of trial and error. Now my network trains and crashes the jupyter python kernel without message after a few hundred minibatches. :D But that's probably a memory problem on my end.

I'm good for now, but I still wonder if MiniBatchSourceFromData is broken, or if I made an error, in which case the error message is unusable cryptic.