microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Cannot feed training data manually to the model #3715

Open faruknane opened 5 years ago

faruknane commented 5 years ago

I want to simply make a RNN model like below. many2many (one output for each input)

def create_model(inp):
    with C.layers.default_options(initial_state = 0.1):
        m = Embedding(300)(inp)
        m = C.layers.Recurrence(C.layers.LSTM(500))(m)
        m = C.layers.Dropout(0.2, seed=1)(m)
        m = C.layers.Dense(wordcount)(m)
        return m

input = C.sequence.input_variable(wordcount)
model = create_model(input)
output = C.sequence.input_variable(wordcount)
loss = cross_entropy_with_softmax(model, output)
lr = C.learning_parameter_schedule(0.02)
learner = C.adam(model.parameters, lr, momentum = 0.0)

x = x_train[0:32, :-1]
y = x_train[0:32, 1:]
print(x.shape)
print(y.shape)
x = np_utils.to_categorical(x, num_classes = wordcount)
y = np_utils.to_categorical(y, num_classes = wordcount)

print(x.shape)
print(y.shape)

h = loss.train((x, y), parameter_learners=[learner])

But it gives me an error. The output is below.

(32, 40)
(32, 40)
(32, 40, 11494)
(32, 40, 11494)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-70-2e00927baa0b> in <module>
     26 print(y.shape)
     27 
---> 28 h = loss.train((x, y), parameter_learners=[learner])
     29 
     30 #inp = keras.layers.Input(batch_shape=(batchsize,None))#None?

~\Anaconda3\lib\site-packages\cntk\ops\functions.py in train(self, minibatch_source, minibatch_size, streams, model_inputs_to_streams, parameter_learners, callbacks, progress_frequency, max_epochs, epoch_size, max_samples)
   1484                               progress_frequency=progress_frequency, max_samples=max_samples,
   1485                               checkpoint_config=configs.checkpoint_configs[0], cv_config=configs.cv_configs[0], test_config=configs.test_configs[0])
-> 1486         ts.train()
   1487         res = Record(updates=collector.training_updates, epoch_summaries=collector.training_summaries) if len(collector.training_summaries) > 0 else \
   1488               Record(updates=[Record(loss=0, metric=0, samples=0)], epoch_summaries=[Record(loss=0, metric=0, samples=0)])

~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
     67     @wraps(f)
     68     def wrapper(*args, **kwds):
---> 69         result = f(*args, **kwds)
     70         map_if_possible(result)
     71         return result

~\Anaconda3\lib\site-packages\cntk\train\training_session.py in train(self, device)
    331             device = use_default_device()
    332 
--> 333         super(TrainingSession, self).train(device)
    334 
    335     def on_cross_validation_end(self, index, average_error, num_samples, num_minibatches):

~\Anaconda3\lib\site-packages\cntk\cntk_py.py in train(self, computeDevice)
   3597 
   3598     def train(self, computeDevice):
-> 3599         return _cntk_py.TrainingSession_train(self, computeDevice)
   3600 
   3601     def restore_from_checkpoint(self, checkpointFileName):

RuntimeError: The number (40) of time steps in the packed MBLayout does not match the longest sequence's length (0) in the Value object

[CALL STACK]
    > std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>::  weak_from_this
    - CNTK::DictionaryValue::FreePtrAsType<CNTK::NDShape>  
    - CNTK::Internal::  UseSparseGradientAggregationInDataParallelSGD (x3)
    - CNTK::Function::  Forward
    - CNTK::  CreateTrainer
    - CNTK::Trainer::  TotalNumberOfUnitsSeen
    - CNTK::Trainer::  TrainMinibatch (x2)
    - CNTK::TrainingSession::  Train
    - PyInit__cntk_py
    - PyCFunction_FastCallDict
    - PyObject_GetAttr
    - PyEval_EvalFrameDefault
    - PyObject_GetAttr

Can you please fix the issue here? My whole day is gone for this and I'm feeling bad about cntk

delzac commented 5 years ago

Hi, i'm sorry that you are having such a difficult time. Deep learning tends to have a steep learning curve.

In CNTK, if you are using sequences, then you should wrap your data as a list of 2-d numpy arrays, instead of a 3-d numpy array. As a list of 2-d numpy array, the sequence length between different samples are vary, and that is a differentiating factor between cntk and other frameworks.

So what you need to do is this:

x = np_utils.to_categorical(x, num_classes = wordcount)
y = np_utils.to_categorical(y, num_classes = wordcount)

x = [i for i in x]  # convert 3d np.ndarray to list of 2d np.ndarray
y = [i for i in y]  # convert 3d np.ndarray to list of 2d np.ndarray

assert all(i.shape == (40, 11494) for i in x)
assert all(i.shape == (40, 11494) for i in y)
h = loss.train((x, y), parameter_learners=[learner])

And it should work.

Moving forward, i recommend that you train using this method instead.

Where you instantiate a Trainer Object and use Trainer.train_minibatch. Thats my go to method to train in cntk.

loss.train is a convenience wrapper around Trainer.

faruknane commented 5 years ago

@delzac Thank you for your reply. However, now it gives this error: I completely understand what you meant, and edited my code according to list type.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\cntk\io\__init__.py in _next_minibatch(self, info_map, mb_size_in_sequences, mb_size_in_samples, number_of_workers, worker_rank, device)
    468         # mbsize_in_sequences is ignored
    469 
--> 470         mb = self.next_minibatch(mb_size_in_samples, number_of_workers, worker_rank, device)
    471         info_map.update(mb)
    472 

~\Anaconda3\lib\site-packages\cntk\io\__init__.py in next_minibatch(self, num_samples, number_of_workers, worker_rank, device)
    732                     from cntk import input_variable, device
    733                     self._vars[si.name] = input_variable(**self._types[si.name])
--> 734                 value = Value.create(self._vars[si.name], mb_data)
    735             else:
    736                 value = Value(mb_data)

~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
     67     @wraps(f)
     68     def wrapper(*args, **kwds):
---> 69         result = f(*args, **kwds)
     70         map_if_possible(result)
     71         return result

~\Anaconda3\lib\site-packages\cntk\core.py in create(var, data, seq_starts, device, read_only)
    464             device,
    465             read_only,
--> 466             True)  # always create a copy in Value
    467 
    468         return value

ValueError: Value::Create:: The number of sequences must be > 0

[CALL STACK]
    > std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>::  weak_from_this
    - CNTK::Value::  Create
    - PyInit__cntk_py (x2)
    - PyCFunction_FastCallDict
    - PyObject_GetAttr
    - PyEval_EvalFrameDefault
    - PyUnicode_RichCompare
    - PySequence_Check
    - PyUnicodeWriter_WriteSubstring
    - PyEval_EvalFrameDefault
    - PyUnicode_RichCompare
    - PyObject_GetAttr
    - PyEval_EvalFrameDefault
    - PyUnicode_RichCompare
    - PyObject_GetAttr

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-ff9a2bc746d1> in <module>
     27 assert all(i.shape == (40, 11493) for i in x)
     28 assert all(i.shape == (40, 11493) for i in y)
---> 29 h = loss.train((x, y), parameter_learners=[learner])
     30 
     31 #inp = keras.layers.Input(batch_shape=(batchsize,None))#None?

~\Anaconda3\lib\site-packages\cntk\ops\functions.py in train(self, minibatch_source, minibatch_size, streams, model_inputs_to_streams, parameter_learners, callbacks, progress_frequency, max_epochs, epoch_size, max_samples)
   1484                               progress_frequency=progress_frequency, max_samples=max_samples,
   1485                               checkpoint_config=configs.checkpoint_configs[0], cv_config=configs.cv_configs[0], test_config=configs.test_configs[0])
-> 1486         ts.train()
   1487         res = Record(updates=collector.training_updates, epoch_summaries=collector.training_summaries) if len(collector.training_summaries) > 0 else \
   1488               Record(updates=[Record(loss=0, metric=0, samples=0)], epoch_summaries=[Record(loss=0, metric=0, samples=0)])

~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
     67     @wraps(f)
     68     def wrapper(*args, **kwds):
---> 69         result = f(*args, **kwds)
     70         map_if_possible(result)
     71         return result

~\Anaconda3\lib\site-packages\cntk\train\training_session.py in train(self, device)
    331             device = use_default_device()
    332 
--> 333         super(TrainingSession, self).train(device)
    334 
    335     def on_cross_validation_end(self, index, average_error, num_samples, num_minibatches):

~\Anaconda3\lib\site-packages\cntk\cntk_py.py in train(self, computeDevice)
   3597 
   3598     def train(self, computeDevice):
-> 3599         return _cntk_py.TrainingSession_train(self, computeDevice)
   3600 
   3601     def restore_from_checkpoint(self, checkpointFileName):

RuntimeError: SWIG director method error.
faruknane commented 5 years ago

is there something going wrong with my model? I didn't use any sparse input. I just wanted to simply make an example then improve the model.

delzac commented 5 years ago

In CNTK, if your model can compile, then there's no problem with your model. Everything is due to runtime right now.

Did you define epoch_size in train()? You need to set it to None.

faruknane commented 5 years ago

Now it works. interesting errors :(

Can I ask you about something one hour later here ?

delzac commented 5 years ago

The docs provide helpful clues as to how to set your arguments :)

Sure, but i'll be asleep by then.

faruknane commented 5 years ago

I got it wrong. It doesnt work, still gives the error, although I defined epoch size as None. I tried to set it 1 no change. The thing got me thinking it was working that if you try to run loss.train > 5-10 times, after some point it doesnt give error which made me think the code was working. !!!!! i am so annoyed by this

faruknane commented 5 years ago
def create_model(inp):
    with C.layers.default_options(initial_state = 0.1):
        m = Embedding(300)(inp)
        m = C.layers.Recurrence(C.layers.LSTM(500))(m)
        m = C.layers.Dropout(0.2, seed=1)(m)
        m = C.layers.Dense(wordcount)(m)
        return m

input = C.sequence.input_variable(wordcount)
model = create_model(input)
output = C.sequence.input_variable(wordcount)
loss = cross_entropy_with_softmax(model, output)
lr = C.learning_parameter_schedule(0.02)
learner = C.adam(model.parameters, lr, momentum = 0.0)

x = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]
y = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]

h = loss.train((x, y), parameter_learners=[learner])

#h = loss.train((x, y), parameter_learners=[learner], epoch_size = None) this doenst change

This code doesn't work either.

delzac commented 5 years ago

I'm not getting an error from my code with epoch_size=None. What's the exception that's being raised.

Are you sure that you are feeding in a list of 2-d numpy array?

faruknane commented 5 years ago
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\cntk\io\__init__.py in _next_minibatch(self, info_map, mb_size_in_sequences, mb_size_in_samples, number_of_workers, worker_rank, device)
    468         # mbsize_in_sequences is ignored
    469 
--> 470         mb = self.next_minibatch(mb_size_in_samples, number_of_workers, worker_rank, device)
    471         info_map.update(mb)
    472 

~\Anaconda3\lib\site-packages\cntk\io\__init__.py in next_minibatch(self, num_samples, number_of_workers, worker_rank, device)
    732                     from cntk import input_variable, device
    733                     self._vars[si.name] = input_variable(**self._types[si.name])
--> 734                 value = Value.create(self._vars[si.name], mb_data)
    735             else:
    736                 value = Value(mb_data)

~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
     67     @wraps(f)
     68     def wrapper(*args, **kwds):
---> 69         result = f(*args, **kwds)
     70         map_if_possible(result)
     71         return result

~\Anaconda3\lib\site-packages\cntk\core.py in create(var, data, seq_starts, device, read_only)
    464             device,
    465             read_only,
--> 466             True)  # always create a copy in Value
    467 
    468         return value

ValueError: Value::Create:: The number of sequences must be > 0

[CALL STACK]
    > std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>::  weak_from_this
    - CNTK::Value::  Create
    - PyInit__cntk_py (x2)
    - PyCFunction_FastCallDict
    - PyObject_GetAttr
    - PyEval_EvalFrameDefault
    - PyUnicode_RichCompare
    - PySequence_Check
    - PyUnicodeWriter_WriteSubstring
    - PyEval_EvalFrameDefault
    - PyUnicode_RichCompare
    - PyObject_GetAttr
    - PyEval_EvalFrameDefault
    - PyUnicode_RichCompare
    - PyObject_GetAttr

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-15d0d72ae305> in <module>
     20 y = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]
     21 
---> 22 h = loss.train((x, y), parameter_learners=[learner], epoch_size = None)
     23 
     24 #inp = keras.layers.Input(batch_shape=(batchsize,None))#None?

~\Anaconda3\lib\site-packages\cntk\ops\functions.py in train(self, minibatch_source, minibatch_size, streams, model_inputs_to_streams, parameter_learners, callbacks, progress_frequency, max_epochs, epoch_size, max_samples)
   1484                               progress_frequency=progress_frequency, max_samples=max_samples,
   1485                               checkpoint_config=configs.checkpoint_configs[0], cv_config=configs.cv_configs[0], test_config=configs.test_configs[0])
-> 1486         ts.train()
   1487         res = Record(updates=collector.training_updates, epoch_summaries=collector.training_summaries) if len(collector.training_summaries) > 0 else \
   1488               Record(updates=[Record(loss=0, metric=0, samples=0)], epoch_summaries=[Record(loss=0, metric=0, samples=0)])

~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
     67     @wraps(f)
     68     def wrapper(*args, **kwds):
---> 69         result = f(*args, **kwds)
     70         map_if_possible(result)
     71         return result

~\Anaconda3\lib\site-packages\cntk\train\training_session.py in train(self, device)
    331             device = use_default_device()
    332 
--> 333         super(TrainingSession, self).train(device)
    334 
    335     def on_cross_validation_end(self, index, average_error, num_samples, num_minibatches):

~\Anaconda3\lib\site-packages\cntk\cntk_py.py in train(self, computeDevice)
   3597 
   3598     def train(self, computeDevice):
-> 3599         return _cntk_py.TrainingSession_train(self, computeDevice)
   3600 
   3601     def restore_from_checkpoint(self, checkpointFileName):

RuntimeError: SWIG director method error.

I am sure that i am feeding a list of 2d numpy arrays. My cntk version is 2.7. Which version do you have right now? I'd like to invite you to my jupyter notebook anywhere online (maybe azure cloud something like that )

delzac commented 5 years ago

Yes i'm using cntk2.7. Can you just run these two lines of code before the loss.train()

assert all(i.ndim == 2 for i in x)
assert all(i.ndim == 2 for i in y)
faruknane commented 5 years ago
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\cntk\io\__init__.py in _next_minibatch(self, info_map, mb_size_in_sequences, mb_size_in_samples, number_of_workers, worker_rank, device)
    468         # mbsize_in_sequences is ignored
    469 
--> 470         mb = self.next_minibatch(mb_size_in_samples, number_of_workers, worker_rank, device)
    471         info_map.update(mb)
    472 

~\Anaconda3\lib\site-packages\cntk\io\__init__.py in next_minibatch(self, num_samples, number_of_workers, worker_rank, device)
    732                     from cntk import input_variable, device
    733                     self._vars[si.name] = input_variable(**self._types[si.name])
--> 734                 value = Value.create(self._vars[si.name], mb_data)
    735             else:
    736                 value = Value(mb_data)

~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
     67     @wraps(f)
     68     def wrapper(*args, **kwds):
---> 69         result = f(*args, **kwds)
     70         map_if_possible(result)
     71         return result

~\Anaconda3\lib\site-packages\cntk\core.py in create(var, data, seq_starts, device, read_only)
    464             device,
    465             read_only,
--> 466             True)  # always create a copy in Value
    467 
    468         return value

ValueError: Value::Create:: The number of sequences must be > 0

[CALL STACK]
    > std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>::  weak_from_this
    - CNTK::Value::  Create
    - PyInit__cntk_py (x2)
    - PyCFunction_FastCallDict
    - PyObject_GetAttr
    - PyEval_EvalFrameDefault
    - PyUnicode_RichCompare
    - PySequence_Check
    - PyUnicodeWriter_WriteSubstring
    - PyEval_EvalFrameDefault
    - PyUnicode_RichCompare
    - PyObject_GetAttr
    - PyEval_EvalFrameDefault
    - PyUnicode_RichCompare
    - PyObject_GetAttr

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-b228e9de4016> in <module>
     21 assert all(i.ndim == 2 for i in x)
     22 assert all(i.ndim == 2 for i in y)
---> 23 h = loss.train((x, y), parameter_learners=[learner], epoch_size = None)
     24 
     25 #inp = keras.layers.Input(batch_shape=(batchsize,None))#None?

~\Anaconda3\lib\site-packages\cntk\ops\functions.py in train(self, minibatch_source, minibatch_size, streams, model_inputs_to_streams, parameter_learners, callbacks, progress_frequency, max_epochs, epoch_size, max_samples)
   1484                               progress_frequency=progress_frequency, max_samples=max_samples,
   1485                               checkpoint_config=configs.checkpoint_configs[0], cv_config=configs.cv_configs[0], test_config=configs.test_configs[0])
-> 1486         ts.train()
   1487         res = Record(updates=collector.training_updates, epoch_summaries=collector.training_summaries) if len(collector.training_summaries) > 0 else \
   1488               Record(updates=[Record(loss=0, metric=0, samples=0)], epoch_summaries=[Record(loss=0, metric=0, samples=0)])

~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
     67     @wraps(f)
     68     def wrapper(*args, **kwds):
---> 69         result = f(*args, **kwds)
     70         map_if_possible(result)
     71         return result

~\Anaconda3\lib\site-packages\cntk\train\training_session.py in train(self, device)
    331             device = use_default_device()
    332 
--> 333         super(TrainingSession, self).train(device)
    334 
    335     def on_cross_validation_end(self, index, average_error, num_samples, num_minibatches):

~\Anaconda3\lib\site-packages\cntk\cntk_py.py in train(self, computeDevice)
   3597 
   3598     def train(self, computeDevice):
-> 3599         return _cntk_py.TrainingSession_train(self, computeDevice)
   3600 
   3601     def restore_from_checkpoint(self, checkpointFileName):

RuntimeError: SWIG director method error.

I did. I think I am going to reinstall cntk. I will inform you after that. //EDIT: I tried the code on azurenotebook. IT gave me the same error. its cntk version is 2.5.1. I can share with you the notebook.

https://deneme-anontm8n3w.notebooks.azure.com/j/notebooks/Untitled.ipynb

delzac commented 5 years ago

Let me know if you still get the same error after upgrading.

faruknane commented 5 years ago

I reinstalled cntk. Still same error :///

faruknane commented 5 years ago

did you try azure notebook

delzac commented 5 years ago

Hi, i realise that it has nothing to do with whether you feed in a list of 2d numpy array or a single block 3d numpy array.

But the error is coming from minibatch_size argument in loss.train(). Set minibatch_size=32 and the error will disappear. Unfortunately, i haven't found a way to operate loss.train() properly, so i suggest you stick to minibatch_size=32 to avoid the error. You can set max_samples to control how long you want the training to persist.

Lastly, i suggest that you use the Trainer class to train moving forward instead. The behavior is straightforward and easy to understand.

delzac commented 5 years ago

Also, i just tested. It seems like the loss.train() API is buggy when you are using cntk dynamic sequences. It you are training non-sequences, minibatch_size can be set to anything at all.

faruknane commented 5 years ago

It seems I have to use Trainer class. Because now another problem is occured.

I define the model with the code below:


def create_model(inp):
    with C.layers.default_options(initial_state = 0.1):
        m = Embedding(300)(inp)
        m = C.layers.Recurrence(C.layers.LSTM(500))(m)
        m = C.layers.Dropout(0.2, seed=1)(m)
        m = C.layers.Dense(wordcount)(m)
        return m

input = C.sequence.input_variable(wordcount)
model = create_model(input)
output = C.sequence.input_variable(wordcount)
loss = cross_entropy_with_softmax(model, output)
lr = C.learning_parameter_schedule(0.02)
learner = C.adam(model.parameters, lr, momentum = 0.0)

Then I train loss func like:

x = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]
y = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]
assert all(i.ndim == 2 for i in x)
assert all(i.ndim == 2 for i in y)
loss = cross_entropy_with_softmax(model, output)
h = loss.train((x, y), parameter_learners=[learner],minibatch_size=32, max_samples = 32*32)

At first time loss.train works and my gpu load becomes high. Then at second time I try to train, the code does't work. It just do nothing. WOW! what a buggy classes. Should someone from CNTK team take a look at here?

delzac commented 5 years ago

Second time as in you rerun the entire code again? You should be careful when using jupyter notebook, always shutdown the entire kernel and rerun it again. There are hidden states in jupyter notebook that affects all deep learning framework.

Also, few people use the loss.train() API, i suggest you move on to use the Trainer class and trainer.train_minibatch function. I doubt the maintainers will resolve this bug. Its not serious.

faruknane commented 5 years ago

I am now using Trainer class. I made a mess here. Now I am able to train the model. Thank you so much for your support. I really feel bad about that although it is a small issue, we spend a lot time (especially me :D). With trainer class it s working properly. Thank you again.

faruknane commented 5 years ago

I d like to ask you something called sparse input about 1 or 2 hours later after making sure that normal model does its job.

delzac commented 5 years ago

I hope you find joy in using cntk, its really lovely. :)

faruknane commented 5 years ago

Could we make a skype session with you? It may sound crazy to want help via skype from someone that I dont know. Although the model that was written by me with keras have been working properly. My aim is to build the same model in cntk. However, now I can't train my model in cntk. Loss is stuck. can't decrease. I'd like to exaplain in detailed way if we talk. I am so sorry to make you give effort, but I am starting to think that there are a lot not going right in CNTK.

delzac commented 5 years ago

Hi, i don't think i have the time to sit down for a skype session. And its better to just ask here too, since there may be other people who face the same problem that can benefit from having a record of the answers/discussion here.

Anyway, if its working in Keras, there's no reason why it shouldn't work in cntk. You just need to carefully transfer the keras model and hyperparameters into cntk.

faruknane commented 5 years ago

I figured out that in keras I set the learning rate as 0.001 with adam optimizer. In cntk 0.001 might be low value. I set learning rate to 3. It seems loss is decreasing but also not in healthy way. So interesting that learning rate differs from keras to cntk although keras using cntk backend.

I created the exactly same model in cntk here is what i did:

My keras model was like below:

"inp = keras.layers.Input(batch_shape=(batchsize,None))#None?\n",
    "a = Embedding(wordcount, 300, mask_zero=True)(inp)\n",
    "a = LSTM(500, return_sequences = True,stateful=True)(a)\n",
    "a = Dense(300, activation='relu')(a)\n",
    "a = Dense(wordcount, activation='softmax')(a) \n",
    "\n",
    "model = Model(inputs=[inp], outputs=[a])\n",
    "op = keras.optimizers.Adam(lr=0.001, decay=0)\n",
    "model.compile(optimizer=op, loss='categorical_crossentropy', metrics=[\"acc\"])\n",
    "model.summary()"

loss graph of keras model: image

And here is my cntk model: image

training log of cntk model: image image

My training data is correct and as well as my minibatches. I really don't know what s wrong about cntk model

//btw I look into your github account. There are many things that I loved and look forward to read. thanks for that opportunity.

delzac commented 5 years ago

keras optimizer is not using the optimizer in cntk. So you should expect a different behaviour.

Also, any reason why you are starting the initial state of all your cntk layers with 0.1?

faruknane commented 5 years ago

In tutorial 202 from microsoft cntk. they do the same, so I didnt change it. However I tried without initial state, the result doesnt change. I stil cant train my model.

delzac commented 5 years ago

what if you use learning rate of between 0.1-0.5?

faruknane commented 5 years ago

When I set learning rate to 0.3, it becomes able to be trained. However, I am getting not the same performance compared to keras. It is much worse somehow. now my loss is 4.7 I am waiting the model to be trained. Training time is long as u might guess. This is a word based language model, so i dont expect zero loss. But it needs to get down to 2 or 3 at least.

delzac commented 5 years ago

There are other parameters in adam optimiser that you can adjust too. Does those other hyperparameters match the ones you use in keras?

faruknane commented 5 years ago

keras doenst use momentum with adam, to be more clear, they dont let u set momentum when using adam. However cntk adam gets momentum parameter. That s what I noticed.

delzac commented 5 years ago

There are other parameters than momentum too. You can adjust that. Or try a higher learning rate.

faruknane commented 5 years ago

I liked the functionality and the flexibility of cntk. I started learning deep learning with cntk. After a while because I got so many errors while using cntk and there are not many platforms where I can ask question, I skiped to keras.

I don't know about other parameters, I will just wait I guess. It's ridiculous waste of effort for both you and me, I wish I would like to use this opportunity on any other scientific topic related with deep learning like attention mechanisms. However, I am stuck with this now.

delzac commented 5 years ago

Tuning hyperparameters is a challenge for even experienced practitioners, don't feel too bad about it. It takes a while to get the hang of it. I took me years to build the experience and intuition to know what to tune. There were times i spent months tuning hyperparameters only to find out it came from a stupid mistake.

This paper by Leslie Smith really help me, you can read through to accelerate your learning.

I wish you good luck! :)

delzac commented 5 years ago

Seprately, the momentum parameter in cntk adam is actually called beta1 in keras and tensorflow,

faruknane commented 5 years ago

Hi again, thanks for the reply, I took a look at the paper you mentioned. I noticed that learning rate is much more sensitive in cntk in my opinion compared to keras.

I finally achieved training the model. Now I am working on Attention mechanism without using the default attention layer in cntk. The attention layer in cntk is designed to be used for encoder-decoder scenario which in this case is not suitable for me. So I've decided to write it on my own where there is only 1 RNN. There won't be 2 RNN like encoder and decoder. It's just 1 RNN which predicts the probabilistic dist. of the next word for given a word history (sequence).

To be more precise, let me give an example: 1 RNN layer, takes N input and gives N output. Input sequence ... = [ HiI__amgoing____to ] __v_v___vvv___.
Output sequence = [ I__amgoing_to__school ]

I want to build an attention system onto that where every hidden state of the RNN looks previous n-1 hidden states (for current hidden state n) and gets the final attention vector of the weighted average of those hidden states.

I modeled the system in paint. image

Is that attention idea writable in cntk and does it work correctly? I

delzac commented 5 years ago

So what you want to implement is basically self-attention with temporal masking? That is, the attention is not allowed to read into the "future".

faruknane commented 5 years ago

Yes it's not allowed to see the future. What do you mean by temporal masking? is it to ignore some hidden states? I am planning to use PastValueWindow or PastValue function to get the previous hidden states for given hidden state in lstm cell function. I just want to step forward slowly and carefully knowing whether it is gonna work or not.

lstm function is like (sample code):

        @C.Function
        def lstm_with_attention(h, c, x):
            # attention is inside lstm cell as we want it to
            # attend differently every time step (dynamic)
            # 'attended_encoded' is a weighted sum of encoded tensor
            attended_encoded = attention(encoded_tensor, h)
            xx = C.splice(attended_encoded, x)

            return lstm(h, c, xx)
delzac commented 5 years ago

Temporal masking means you zero out some of the values along the time axis, usually so that the information in the future doesn't leak into the past.

There is a library i maintain called cntkx that has what you need. You can check out ScaledDotProductAttention and MultiHeadAttention. Set the obey_sequence_order=True to achieve the effect you want.

faruknane commented 5 years ago

I understood the masking part. Thanks.

Yeah, I know your GitHub repo. I have examined it (not in a detailed way) the day we started talking in here. I saw the cntkx library.

However, your implementation of attention might be a bit hard for me to understand. I will examine attention implementation of yours soon and I also want to build a model on my own based on what I understood from papers mentioning attention in order to be able to make changes in the model according to my opinion in the future, maybe someday it will evolve another new model. Thank you a lot for everything!

Btw I am a freshman studying computer science at BOUN university and I'd like to connect with you via LinkedIn if you don't mind. Thanks and regards.

delzac commented 5 years ago

Good luck with your studies! Alas, i don't have linkedin, sorry about that :(

faruknane commented 5 years ago

So basically there are many attention implementations. What you did is attention mechanism like word2vec. Instead of having one vector for a word, you create Q K V vectors as it is U V vectors in word2vec. However, I am going to implement another version where I will basically take previous hidden states and dot product them with the current hidden state, put those values into softmax function. After I get the probability vector, I calculate the weighted average of hidden states. I hope it works :)

delzac commented 5 years ago

The attention that is implementedin cntkx follows from this paper: Attention is all you need

faruknane commented 5 years ago

Hi again, I tried to implement the attention model shown in this video. It is working actually, however, I dont know whether my implementation of attention model is better than normal RNN lstm but as I said it works properly. I want you to look at my attention layer just in case if you have a few comments for me.

def Self_Attention_LSTM(shape:int, lookback):
    lstm = LSTM(shape)

    @C.Function
    def lstm_with_attention(h, c, x):

        travel = h
        probabilities = -1
        for i in range(0,lookback):
                travel = C.sequence.past_value(travel)

                dotproduct = C.reduce_sum(C.element_times(travel, h)) #dot product of previous hidden state 'h' and the hidden states before 'h'

                if probabilities == -1:
                    probabilities = dotproduct
                else:
                    probabilities = C.splice(probabilities, dotproduct) # combine all dot product results

        probabilities = C.softmax(probabilities) # turn into probabilities

        travel = h
        attention = C.zeros_like(h)
        for i in range(0,lookback):
                travel = C.sequence.past_value(travel)
                attention = C.sum(attention, travel*(C.slice(probabilities, 0,i,i+1))) # add the hidden state multiplied by its "attention value" / "attention probability" to the attention vector.

        xx = C.splice(x, attention) # combine attention vector and input as new input to feed lstm.
        return lstm(h, c, xx)
    return lstm_with_attention

Usage of self attention layer:

def create_model(inp):
    m = Embedding(200)
    m = C.layers.Recurrence(Self_Attention_LSTM(800,40))(m)
    m = C.layers.Dropout(0.3)(m)
    m = C.layers.Dense(400)(m)
    m = C.relu(m)
    m = C.layers.Dense(wordcount)(m)
    return m(inp)

input = C.sequence.input_variable(wordcount)
model = create_model(input)