Open mohamad-hasan-sohan-ajini opened 7 years ago
Can you share the code?
The code relates to the exception is:
# Create a minibatch source.
def create_mb_source(features_file, labels_file, is_training=True):
global feature_dim, num_classes, context
fd = HTKFeatureDeserializer(StreamDefs(amazing_features=StreamDef(shape=feature_dim, context=(context, context), scp=features_file)))
ld = cntk.io.CTFDeserializer(labels_file, StreamDefs(awesome_labels=StreamDef(field='l', shape=num_classes, is_sparse=True)))
# Enabling BPTT with truncated_length > 0
return MinibatchSource([fd, ld], truncation_length=20, max_sweeps=cntk.io.INFINITELY_REPEAT if is_training else 1)
The issue is solved by setting truncation_length
to zero. I just try to use CTC, and here recommends that set truncation_length
for BPTT. I don't understand why I get the exception. Did I do something wrong?
@ebarsoumMS thanks for your attention.
Please share the definition of your network. It is not about the minibatch source. In BPTT all sequences in the minibatch will be truncated to 20 frames. It seems in your network you are trying to access frames outside of this range. Thanks!
here is my network definition:
features = sequence.input_variable(((2 * context + 1) * feature_dim), name='feature')
L1 = Dense(128, activation=relu, name='L1')(features)
L2 = Recurrence(LSTM(128, use_peepholes=True, name='L2_lstm'), name='L2_recurrent')(L1)
L3 = Dense(num_classes, activation=relu, name='L3')(L2)
labels = sequence.input_variable((num_classes), name='label')
graph = labels_to_graph(labels)
cr = forward_backward(graph, L3, 132)
err = edit_distance_error(labels, L3, squashInputs=True, tokensToIgnore=[132])
The code exists here
@eldakms
Does the problem solved or not? Using truncated BPTT will highly reduces training time. Currently, without using truncated BPTT, each epoch on our dataset (108 hours of speech) takes 9 hours long which is not scalable. So we really need to use truncated version to speed up.
Does the problem will be eliminated using stateless LSTM? For example, instead of feeding a sequence to stateful LSTM (which currently can not be trained using BPTT), feed frames with say context of 10 frames to a stateless LSTM network. Do they work the same or not?
@eldakms @ebarsoumMS
regards
Hi
I get the following error after first minibatch of training:
Although the loss of first minibatch is inf, but the error is totally irrelevant!
Running my code on CPU just thrown core dump with no more info. Any help!