microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.53k stars 4.28k forks source link

RuntimeError: GetColumnIndex #2211

Open mohamad-hasan-sohan-ajini opened 7 years ago

mohamad-hasan-sohan-ajini commented 7 years ago

Hi

I get the following error after first minibatch of training:

RuntimeError: GetColumnIndex: Attempted to access a time step that is accessing a portion of a sequence that is not included in current minibatch.

Although the loss of first minibatch is inf, but the error is totally irrelevant!

Running my code on CPU just thrown core dump with no more info. Any help!

ebarsoumMS commented 7 years ago

Can you share the code?

mohamad-hasan-sohan-ajini commented 7 years ago

The code relates to the exception is:

# Create a minibatch source.
def create_mb_source(features_file, labels_file, is_training=True):
    global feature_dim, num_classes, context
    fd = HTKFeatureDeserializer(StreamDefs(amazing_features=StreamDef(shape=feature_dim, context=(context, context), scp=features_file)))

    ld = cntk.io.CTFDeserializer(labels_file, StreamDefs(awesome_labels=StreamDef(field='l', shape=num_classes, is_sparse=True)))

    # Enabling BPTT with truncated_length > 0
    return MinibatchSource([fd, ld], truncation_length=20, max_sweeps=cntk.io.INFINITELY_REPEAT if is_training else 1)

The issue is solved by setting truncation_length to zero. I just try to use CTC, and here recommends that set truncation_length for BPTT. I don't understand why I get the exception. Did I do something wrong?

@ebarsoumMS thanks for your attention.

eldakms commented 7 years ago

Please share the definition of your network. It is not about the minibatch source. In BPTT all sequences in the minibatch will be truncated to 20 frames. It seems in your network you are trying to access frames outside of this range. Thanks!

mohamad-hasan-sohan-ajini commented 7 years ago

here is my network definition:

    features = sequence.input_variable(((2 * context + 1) * feature_dim), name='feature')
    L1 = Dense(128, activation=relu, name='L1')(features)
    L2 = Recurrence(LSTM(128, use_peepholes=True, name='L2_lstm'), name='L2_recurrent')(L1)
    L3 = Dense(num_classes, activation=relu, name='L3')(L2)
    labels = sequence.input_variable((num_classes), name='label')
    graph = labels_to_graph(labels)
    cr = forward_backward(graph, L3, 132)
    err = edit_distance_error(labels, L3, squashInputs=True, tokensToIgnore=[132])

The code exists here

@eldakms

mohamad-hasan-sohan-ajini commented 7 years ago

Does the problem solved or not? Using truncated BPTT will highly reduces training time. Currently, without using truncated BPTT, each epoch on our dataset (108 hours of speech) takes 9 hours long which is not scalable. So we really need to use truncated version to speed up.

Does the problem will be eliminated using stateless LSTM? For example, instead of feeding a sequence to stateful LSTM (which currently can not be trained using BPTT), feed frames with say context of 10 frames to a stateless LSTM network. Do they work the same or not?

@eldakms @ebarsoumMS

regards