Open faruknane opened 5 years ago
Hi, i'm sorry that you are having such a difficult time. Deep learning tends to have a steep learning curve.
In CNTK, if you are using sequences, then you should wrap your data as a list of 2-d numpy arrays, instead of a 3-d numpy array. As a list of 2-d numpy array, the sequence length between different samples are vary, and that is a differentiating factor between cntk and other frameworks.
So what you need to do is this:
x = np_utils.to_categorical(x, num_classes = wordcount)
y = np_utils.to_categorical(y, num_classes = wordcount)
x = [i for i in x] # convert 3d np.ndarray to list of 2d np.ndarray
y = [i for i in y] # convert 3d np.ndarray to list of 2d np.ndarray
assert all(i.shape == (40, 11494) for i in x)
assert all(i.shape == (40, 11494) for i in y)
h = loss.train((x, y), parameter_learners=[learner])
And it should work.
Moving forward, i recommend that you train using this method instead.
Where you instantiate a Trainer Object and use Trainer.train_minibatch. Thats my go to method to train in cntk.
loss.train is a convenience wrapper around Trainer.
@delzac Thank you for your reply. However, now it gives this error: I completely understand what you meant, and edited my code according to list type.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\cntk\io\__init__.py in _next_minibatch(self, info_map, mb_size_in_sequences, mb_size_in_samples, number_of_workers, worker_rank, device)
468 # mbsize_in_sequences is ignored
469
--> 470 mb = self.next_minibatch(mb_size_in_samples, number_of_workers, worker_rank, device)
471 info_map.update(mb)
472
~\Anaconda3\lib\site-packages\cntk\io\__init__.py in next_minibatch(self, num_samples, number_of_workers, worker_rank, device)
732 from cntk import input_variable, device
733 self._vars[si.name] = input_variable(**self._types[si.name])
--> 734 value = Value.create(self._vars[si.name], mb_data)
735 else:
736 value = Value(mb_data)
~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
67 @wraps(f)
68 def wrapper(*args, **kwds):
---> 69 result = f(*args, **kwds)
70 map_if_possible(result)
71 return result
~\Anaconda3\lib\site-packages\cntk\core.py in create(var, data, seq_starts, device, read_only)
464 device,
465 read_only,
--> 466 True) # always create a copy in Value
467
468 return value
ValueError: Value::Create:: The number of sequences must be > 0
[CALL STACK]
> std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>:: weak_from_this
- CNTK::Value:: Create
- PyInit__cntk_py (x2)
- PyCFunction_FastCallDict
- PyObject_GetAttr
- PyEval_EvalFrameDefault
- PyUnicode_RichCompare
- PySequence_Check
- PyUnicodeWriter_WriteSubstring
- PyEval_EvalFrameDefault
- PyUnicode_RichCompare
- PyObject_GetAttr
- PyEval_EvalFrameDefault
- PyUnicode_RichCompare
- PyObject_GetAttr
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-18-ff9a2bc746d1> in <module>
27 assert all(i.shape == (40, 11493) for i in x)
28 assert all(i.shape == (40, 11493) for i in y)
---> 29 h = loss.train((x, y), parameter_learners=[learner])
30
31 #inp = keras.layers.Input(batch_shape=(batchsize,None))#None?
~\Anaconda3\lib\site-packages\cntk\ops\functions.py in train(self, minibatch_source, minibatch_size, streams, model_inputs_to_streams, parameter_learners, callbacks, progress_frequency, max_epochs, epoch_size, max_samples)
1484 progress_frequency=progress_frequency, max_samples=max_samples,
1485 checkpoint_config=configs.checkpoint_configs[0], cv_config=configs.cv_configs[0], test_config=configs.test_configs[0])
-> 1486 ts.train()
1487 res = Record(updates=collector.training_updates, epoch_summaries=collector.training_summaries) if len(collector.training_summaries) > 0 else \
1488 Record(updates=[Record(loss=0, metric=0, samples=0)], epoch_summaries=[Record(loss=0, metric=0, samples=0)])
~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
67 @wraps(f)
68 def wrapper(*args, **kwds):
---> 69 result = f(*args, **kwds)
70 map_if_possible(result)
71 return result
~\Anaconda3\lib\site-packages\cntk\train\training_session.py in train(self, device)
331 device = use_default_device()
332
--> 333 super(TrainingSession, self).train(device)
334
335 def on_cross_validation_end(self, index, average_error, num_samples, num_minibatches):
~\Anaconda3\lib\site-packages\cntk\cntk_py.py in train(self, computeDevice)
3597
3598 def train(self, computeDevice):
-> 3599 return _cntk_py.TrainingSession_train(self, computeDevice)
3600
3601 def restore_from_checkpoint(self, checkpointFileName):
RuntimeError: SWIG director method error.
is there something going wrong with my model? I didn't use any sparse input. I just wanted to simply make an example then improve the model.
In CNTK, if your model can compile, then there's no problem with your model. Everything is due to runtime right now.
Did you define epoch_size in train()? You need to set it to None.
Now it works. interesting errors :(
Can I ask you about something one hour later here ?
The docs provide helpful clues as to how to set your arguments :)
Sure, but i'll be asleep by then.
I got it wrong. It doesnt work, still gives the error, although I defined epoch size as None. I tried to set it 1 no change. The thing got me thinking it was working that if you try to run loss.train > 5-10 times, after some point it doesnt give error which made me think the code was working. !!!!! i am so annoyed by this
def create_model(inp):
with C.layers.default_options(initial_state = 0.1):
m = Embedding(300)(inp)
m = C.layers.Recurrence(C.layers.LSTM(500))(m)
m = C.layers.Dropout(0.2, seed=1)(m)
m = C.layers.Dense(wordcount)(m)
return m
input = C.sequence.input_variable(wordcount)
model = create_model(input)
output = C.sequence.input_variable(wordcount)
loss = cross_entropy_with_softmax(model, output)
lr = C.learning_parameter_schedule(0.02)
learner = C.adam(model.parameters, lr, momentum = 0.0)
x = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]
y = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]
h = loss.train((x, y), parameter_learners=[learner])
#h = loss.train((x, y), parameter_learners=[learner], epoch_size = None) this doenst change
This code doesn't work either.
I'm not getting an error from my code with epoch_size=None
. What's the exception that's being raised.
Are you sure that you are feeding in a list of 2-d numpy array?
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\cntk\io\__init__.py in _next_minibatch(self, info_map, mb_size_in_sequences, mb_size_in_samples, number_of_workers, worker_rank, device)
468 # mbsize_in_sequences is ignored
469
--> 470 mb = self.next_minibatch(mb_size_in_samples, number_of_workers, worker_rank, device)
471 info_map.update(mb)
472
~\Anaconda3\lib\site-packages\cntk\io\__init__.py in next_minibatch(self, num_samples, number_of_workers, worker_rank, device)
732 from cntk import input_variable, device
733 self._vars[si.name] = input_variable(**self._types[si.name])
--> 734 value = Value.create(self._vars[si.name], mb_data)
735 else:
736 value = Value(mb_data)
~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
67 @wraps(f)
68 def wrapper(*args, **kwds):
---> 69 result = f(*args, **kwds)
70 map_if_possible(result)
71 return result
~\Anaconda3\lib\site-packages\cntk\core.py in create(var, data, seq_starts, device, read_only)
464 device,
465 read_only,
--> 466 True) # always create a copy in Value
467
468 return value
ValueError: Value::Create:: The number of sequences must be > 0
[CALL STACK]
> std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>:: weak_from_this
- CNTK::Value:: Create
- PyInit__cntk_py (x2)
- PyCFunction_FastCallDict
- PyObject_GetAttr
- PyEval_EvalFrameDefault
- PyUnicode_RichCompare
- PySequence_Check
- PyUnicodeWriter_WriteSubstring
- PyEval_EvalFrameDefault
- PyUnicode_RichCompare
- PyObject_GetAttr
- PyEval_EvalFrameDefault
- PyUnicode_RichCompare
- PyObject_GetAttr
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-3-15d0d72ae305> in <module>
20 y = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]
21
---> 22 h = loss.train((x, y), parameter_learners=[learner], epoch_size = None)
23
24 #inp = keras.layers.Input(batch_shape=(batchsize,None))#None?
~\Anaconda3\lib\site-packages\cntk\ops\functions.py in train(self, minibatch_source, minibatch_size, streams, model_inputs_to_streams, parameter_learners, callbacks, progress_frequency, max_epochs, epoch_size, max_samples)
1484 progress_frequency=progress_frequency, max_samples=max_samples,
1485 checkpoint_config=configs.checkpoint_configs[0], cv_config=configs.cv_configs[0], test_config=configs.test_configs[0])
-> 1486 ts.train()
1487 res = Record(updates=collector.training_updates, epoch_summaries=collector.training_summaries) if len(collector.training_summaries) > 0 else \
1488 Record(updates=[Record(loss=0, metric=0, samples=0)], epoch_summaries=[Record(loss=0, metric=0, samples=0)])
~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
67 @wraps(f)
68 def wrapper(*args, **kwds):
---> 69 result = f(*args, **kwds)
70 map_if_possible(result)
71 return result
~\Anaconda3\lib\site-packages\cntk\train\training_session.py in train(self, device)
331 device = use_default_device()
332
--> 333 super(TrainingSession, self).train(device)
334
335 def on_cross_validation_end(self, index, average_error, num_samples, num_minibatches):
~\Anaconda3\lib\site-packages\cntk\cntk_py.py in train(self, computeDevice)
3597
3598 def train(self, computeDevice):
-> 3599 return _cntk_py.TrainingSession_train(self, computeDevice)
3600
3601 def restore_from_checkpoint(self, checkpointFileName):
RuntimeError: SWIG director method error.
I am sure that i am feeding a list of 2d numpy arrays. My cntk version is 2.7. Which version do you have right now? I'd like to invite you to my jupyter notebook anywhere online (maybe azure cloud something like that )
Yes i'm using cntk2.7. Can you just run these two lines of code before the loss.train()
assert all(i.ndim == 2 for i in x)
assert all(i.ndim == 2 for i in y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\cntk\io\__init__.py in _next_minibatch(self, info_map, mb_size_in_sequences, mb_size_in_samples, number_of_workers, worker_rank, device)
468 # mbsize_in_sequences is ignored
469
--> 470 mb = self.next_minibatch(mb_size_in_samples, number_of_workers, worker_rank, device)
471 info_map.update(mb)
472
~\Anaconda3\lib\site-packages\cntk\io\__init__.py in next_minibatch(self, num_samples, number_of_workers, worker_rank, device)
732 from cntk import input_variable, device
733 self._vars[si.name] = input_variable(**self._types[si.name])
--> 734 value = Value.create(self._vars[si.name], mb_data)
735 else:
736 value = Value(mb_data)
~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
67 @wraps(f)
68 def wrapper(*args, **kwds):
---> 69 result = f(*args, **kwds)
70 map_if_possible(result)
71 return result
~\Anaconda3\lib\site-packages\cntk\core.py in create(var, data, seq_starts, device, read_only)
464 device,
465 read_only,
--> 466 True) # always create a copy in Value
467
468 return value
ValueError: Value::Create:: The number of sequences must be > 0
[CALL STACK]
> std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>:: weak_from_this
- CNTK::Value:: Create
- PyInit__cntk_py (x2)
- PyCFunction_FastCallDict
- PyObject_GetAttr
- PyEval_EvalFrameDefault
- PyUnicode_RichCompare
- PySequence_Check
- PyUnicodeWriter_WriteSubstring
- PyEval_EvalFrameDefault
- PyUnicode_RichCompare
- PyObject_GetAttr
- PyEval_EvalFrameDefault
- PyUnicode_RichCompare
- PyObject_GetAttr
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-7-b228e9de4016> in <module>
21 assert all(i.ndim == 2 for i in x)
22 assert all(i.ndim == 2 for i in y)
---> 23 h = loss.train((x, y), parameter_learners=[learner], epoch_size = None)
24
25 #inp = keras.layers.Input(batch_shape=(batchsize,None))#None?
~\Anaconda3\lib\site-packages\cntk\ops\functions.py in train(self, minibatch_source, minibatch_size, streams, model_inputs_to_streams, parameter_learners, callbacks, progress_frequency, max_epochs, epoch_size, max_samples)
1484 progress_frequency=progress_frequency, max_samples=max_samples,
1485 checkpoint_config=configs.checkpoint_configs[0], cv_config=configs.cv_configs[0], test_config=configs.test_configs[0])
-> 1486 ts.train()
1487 res = Record(updates=collector.training_updates, epoch_summaries=collector.training_summaries) if len(collector.training_summaries) > 0 else \
1488 Record(updates=[Record(loss=0, metric=0, samples=0)], epoch_summaries=[Record(loss=0, metric=0, samples=0)])
~\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
67 @wraps(f)
68 def wrapper(*args, **kwds):
---> 69 result = f(*args, **kwds)
70 map_if_possible(result)
71 return result
~\Anaconda3\lib\site-packages\cntk\train\training_session.py in train(self, device)
331 device = use_default_device()
332
--> 333 super(TrainingSession, self).train(device)
334
335 def on_cross_validation_end(self, index, average_error, num_samples, num_minibatches):
~\Anaconda3\lib\site-packages\cntk\cntk_py.py in train(self, computeDevice)
3597
3598 def train(self, computeDevice):
-> 3599 return _cntk_py.TrainingSession_train(self, computeDevice)
3600
3601 def restore_from_checkpoint(self, checkpointFileName):
RuntimeError: SWIG director method error.
I did. I think I am going to reinstall cntk. I will inform you after that. //EDIT: I tried the code on azurenotebook. IT gave me the same error. its cntk version is 2.5.1. I can share with you the notebook.
https://deneme-anontm8n3w.notebooks.azure.com/j/notebooks/Untitled.ipynb
Let me know if you still get the same error after upgrading.
I reinstalled cntk. Still same error :///
did you try azure notebook
Hi, i realise that it has nothing to do with whether you feed in a list of 2d numpy array or a single block 3d numpy array.
But the error is coming from minibatch_size
argument in loss.train()
. Set minibatch_size=32
and the error will disappear. Unfortunately, i haven't found a way to operate loss.train()
properly, so i suggest you stick to minibatch_size=32
to avoid the error. You can set max_samples
to control how long you want the training to persist.
Lastly, i suggest that you use the Trainer class to train moving forward instead. The behavior is straightforward and easy to understand.
Also, i just tested. It seems like the loss.train() API is buggy when you are using cntk dynamic sequences. It you are training non-sequences, minibatch_size
can be set to anything at all.
It seems I have to use Trainer class. Because now another problem is occured.
I define the model with the code below:
def create_model(inp):
with C.layers.default_options(initial_state = 0.1):
m = Embedding(300)(inp)
m = C.layers.Recurrence(C.layers.LSTM(500))(m)
m = C.layers.Dropout(0.2, seed=1)(m)
m = C.layers.Dense(wordcount)(m)
return m
input = C.sequence.input_variable(wordcount)
model = create_model(input)
output = C.sequence.input_variable(wordcount)
loss = cross_entropy_with_softmax(model, output)
lr = C.learning_parameter_schedule(0.02)
learner = C.adam(model.parameters, lr, momentum = 0.0)
Then I train loss func like:
x = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]
y = [np_utils.to_categorical([2,3,5,6], num_classes = wordcount)]
assert all(i.ndim == 2 for i in x)
assert all(i.ndim == 2 for i in y)
loss = cross_entropy_with_softmax(model, output)
h = loss.train((x, y), parameter_learners=[learner],minibatch_size=32, max_samples = 32*32)
At first time loss.train works and my gpu load becomes high. Then at second time I try to train, the code does't work. It just do nothing. WOW! what a buggy classes. Should someone from CNTK team take a look at here?
Second time as in you rerun the entire code again? You should be careful when using jupyter notebook, always shutdown the entire kernel and rerun it again. There are hidden states in jupyter notebook that affects all deep learning framework.
Also, few people use the loss.train() API, i suggest you move on to use the Trainer class and trainer.train_minibatch function. I doubt the maintainers will resolve this bug. Its not serious.
I am now using Trainer class. I made a mess here. Now I am able to train the model. Thank you so much for your support. I really feel bad about that although it is a small issue, we spend a lot time (especially me :D). With trainer class it s working properly. Thank you again.
I d like to ask you something called sparse input about 1 or 2 hours later after making sure that normal model does its job.
I hope you find joy in using cntk, its really lovely. :)
Could we make a skype session with you? It may sound crazy to want help via skype from someone that I dont know. Although the model that was written by me with keras have been working properly. My aim is to build the same model in cntk. However, now I can't train my model in cntk. Loss is stuck. can't decrease. I'd like to exaplain in detailed way if we talk. I am so sorry to make you give effort, but I am starting to think that there are a lot not going right in CNTK.
Hi, i don't think i have the time to sit down for a skype session. And its better to just ask here too, since there may be other people who face the same problem that can benefit from having a record of the answers/discussion here.
Anyway, if its working in Keras, there's no reason why it shouldn't work in cntk. You just need to carefully transfer the keras model and hyperparameters into cntk.
I figured out that in keras I set the learning rate as 0.001 with adam optimizer. In cntk 0.001 might be low value. I set learning rate to 3. It seems loss is decreasing but also not in healthy way. So interesting that learning rate differs from keras to cntk although keras using cntk backend.
I created the exactly same model in cntk here is what i did:
My keras model was like below:
"inp = keras.layers.Input(batch_shape=(batchsize,None))#None?\n",
"a = Embedding(wordcount, 300, mask_zero=True)(inp)\n",
"a = LSTM(500, return_sequences = True,stateful=True)(a)\n",
"a = Dense(300, activation='relu')(a)\n",
"a = Dense(wordcount, activation='softmax')(a) \n",
"\n",
"model = Model(inputs=[inp], outputs=[a])\n",
"op = keras.optimizers.Adam(lr=0.001, decay=0)\n",
"model.compile(optimizer=op, loss='categorical_crossentropy', metrics=[\"acc\"])\n",
"model.summary()"
loss graph of keras model:
And here is my cntk model:
training log of cntk model:
My training data is correct and as well as my minibatches. I really don't know what s wrong about cntk model
//btw I look into your github account. There are many things that I loved and look forward to read. thanks for that opportunity.
keras optimizer is not using the optimizer in cntk. So you should expect a different behaviour.
Also, any reason why you are starting the initial state of all your cntk layers with 0.1?
In tutorial 202 from microsoft cntk. they do the same, so I didnt change it. However I tried without initial state, the result doesnt change. I stil cant train my model.
what if you use learning rate of between 0.1-0.5?
When I set learning rate to 0.3, it becomes able to be trained. However, I am getting not the same performance compared to keras. It is much worse somehow. now my loss is 4.7 I am waiting the model to be trained. Training time is long as u might guess. This is a word based language model, so i dont expect zero loss. But it needs to get down to 2 or 3 at least.
There are other parameters in adam optimiser that you can adjust too. Does those other hyperparameters match the ones you use in keras?
keras doenst use momentum with adam, to be more clear, they dont let u set momentum when using adam. However cntk adam gets momentum parameter. That s what I noticed.
There are other parameters than momentum too. You can adjust that. Or try a higher learning rate.
I liked the functionality and the flexibility of cntk. I started learning deep learning with cntk. After a while because I got so many errors while using cntk and there are not many platforms where I can ask question, I skiped to keras.
I don't know about other parameters, I will just wait I guess. It's ridiculous waste of effort for both you and me, I wish I would like to use this opportunity on any other scientific topic related with deep learning like attention mechanisms. However, I am stuck with this now.
Tuning hyperparameters is a challenge for even experienced practitioners, don't feel too bad about it. It takes a while to get the hang of it. I took me years to build the experience and intuition to know what to tune. There were times i spent months tuning hyperparameters only to find out it came from a stupid mistake.
This paper by Leslie Smith really help me, you can read through to accelerate your learning.
I wish you good luck! :)
Seprately, the momentum
parameter in cntk adam
is actually called beta1
in keras and tensorflow,
Hi again, thanks for the reply, I took a look at the paper you mentioned. I noticed that learning rate is much more sensitive in cntk in my opinion compared to keras.
I finally achieved training the model. Now I am working on Attention mechanism without using the default attention layer in cntk. The attention layer in cntk is designed to be used for encoder-decoder scenario which in this case is not suitable for me. So I've decided to write it on my own where there is only 1 RNN. There won't be 2 RNN like encoder and decoder. It's just 1 RNN which predicts the probabilistic dist. of the next word for given a word history (sequence).
To be more precise, let me give an example:
1 RNN layer, takes N input and gives N output.
Input sequence ... = [ HiI__amgoing____to ]
__v_v___vvv___.
Output sequence = [ I__amgoing_to__school ]
I want to build an attention system onto that where every hidden state of the RNN looks previous n-1 hidden states (for current hidden state n) and gets the final attention vector of the weighted average of those hidden states.
I modeled the system in paint.
Is that attention idea writable in cntk and does it work correctly? I
So what you want to implement is basically self-attention
with temporal masking? That is, the attention is not allowed to read into the "future".
Yes it's not allowed to see the future. What do you mean by temporal masking? is it to ignore some hidden states? I am planning to use PastValueWindow or PastValue function to get the previous hidden states for given hidden state in lstm cell function. I just want to step forward slowly and carefully knowing whether it is gonna work or not.
lstm function is like (sample code):
@C.Function
def lstm_with_attention(h, c, x):
# attention is inside lstm cell as we want it to
# attend differently every time step (dynamic)
# 'attended_encoded' is a weighted sum of encoded tensor
attended_encoded = attention(encoded_tensor, h)
xx = C.splice(attended_encoded, x)
return lstm(h, c, xx)
Temporal masking means you zero out some of the values along the time axis, usually so that the information in the future doesn't leak into the past.
There is a library i maintain called cntkx that has what you need. You can check out ScaledDotProductAttention
and MultiHeadAttention
. Set the obey_sequence_order=True
to achieve the effect you want.
I understood the masking part. Thanks.
Yeah, I know your GitHub repo. I have examined it (not in a detailed way) the day we started talking in here. I saw the cntkx library.
However, your implementation of attention might be a bit hard for me to understand. I will examine attention implementation of yours soon and I also want to build a model on my own based on what I understood from papers mentioning attention in order to be able to make changes in the model according to my opinion in the future, maybe someday it will evolve another new model. Thank you a lot for everything!
Btw I am a freshman studying computer science at BOUN university and I'd like to connect with you via LinkedIn if you don't mind. Thanks and regards.
Good luck with your studies! Alas, i don't have linkedin, sorry about that :(
So basically there are many attention implementations. What you did is attention mechanism like word2vec. Instead of having one vector for a word, you create Q K V vectors as it is U V vectors in word2vec. However, I am going to implement another version where I will basically take previous hidden states and dot product them with the current hidden state, put those values into softmax function. After I get the probability vector, I calculate the weighted average of hidden states. I hope it works :)
The attention that is implementedin cntkx follows from this paper: Attention is all you need
Hi again, I tried to implement the attention model shown in this video. It is working actually, however, I dont know whether my implementation of attention model is better than normal RNN lstm but as I said it works properly. I want you to look at my attention layer just in case if you have a few comments for me.
def Self_Attention_LSTM(shape:int, lookback):
lstm = LSTM(shape)
@C.Function
def lstm_with_attention(h, c, x):
travel = h
probabilities = -1
for i in range(0,lookback):
travel = C.sequence.past_value(travel)
dotproduct = C.reduce_sum(C.element_times(travel, h)) #dot product of previous hidden state 'h' and the hidden states before 'h'
if probabilities == -1:
probabilities = dotproduct
else:
probabilities = C.splice(probabilities, dotproduct) # combine all dot product results
probabilities = C.softmax(probabilities) # turn into probabilities
travel = h
attention = C.zeros_like(h)
for i in range(0,lookback):
travel = C.sequence.past_value(travel)
attention = C.sum(attention, travel*(C.slice(probabilities, 0,i,i+1))) # add the hidden state multiplied by its "attention value" / "attention probability" to the attention vector.
xx = C.splice(x, attention) # combine attention vector and input as new input to feed lstm.
return lstm(h, c, xx)
return lstm_with_attention
Usage of self attention layer:
def create_model(inp):
m = Embedding(200)
m = C.layers.Recurrence(Self_Attention_LSTM(800,40))(m)
m = C.layers.Dropout(0.3)(m)
m = C.layers.Dense(400)(m)
m = C.relu(m)
m = C.layers.Dense(wordcount)(m)
return m(inp)
input = C.sequence.input_variable(wordcount)
model = create_model(input)
I want to simply make a RNN model like below. many2many (one output for each input)
But it gives me an error. The output is below.
Can you please fix the issue here? My whole day is gone for this and I'm feeling bad about cntk