mpezeshki / CTC-Connectionist-Temporal-Classification

Theano implementation of CTC.
Apache License 2.0
74 stars 26 forks source link

example not running #1

Open skaae opened 9 years ago

skaae commented 9 years ago

I'm trying to run your ctc example but i get the following error:

Building model ...
/Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *
Bulding DataStream ...
Bulding training process...
INFO:blocks.algorithms:Taking the cost gradient
INFO:blocks.algorithms:The cost gradient computation graph is built
Starting training ...
INFO:blocks.main_loop:Entered the main loop
INFO:blocks.algorithms:Initializing the training algorithm
INFO:blocks.algorithms:The training algorithm is initialized
ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run `on_error` extensions, potentially saving data, before exiting and reraising the error. Note that the usual `after_training` extensions will *not* be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.

-------------------------------------------------------------------------------
BEFORE FIRST EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     epoch_interrupt_received: False
     epoch_started: True
     epochs_done: 0
     iterations_done: 0
     received_first_batch: False
     training_started: True
Log records from the iteration 0:

Traceback (most recent call last):
  File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in <module>
    main_loop.run()
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run
    reraise_as(e)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/__init__.py", line 225, in reraise_as
    six.reraise(type(new_exc), new_exc, orig_exc_traceback)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run
    while self._run_epoch():
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch
    while self._run_iteration():
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration
    self.algorithm.process_batch(batch)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/__init__.py", line 234, in process_batch
    self._function(*ordered_batch)
  File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 517, in __call__
    allow_downcast=s.allow_downcast)
  File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/tensor/type.py", line 130, in filter
    raise TypeError(err_msg, data)
TypeError: ('Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/__init__.py:224"  at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 0.  0.  0.  0.  1.  1.  1.  1.  1.  1.]]\n\nOriginal exception:\n\tTypeError: Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/__init__.py:224"  at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 0.  0.  0.  0.  1.  1.  1.  1.  1.  1.]]', 'TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".', array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.]]))

Which i think i can workaround by setting allow_input_downcast=True in line 224 in blocks/algorithms/__init__.py

But then i get another error:

Building model ...
/Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *
Bulding DataStream ...
Bulding training process...
INFO:blocks.algorithms:Taking the cost gradient
INFO:blocks.algorithms:The cost gradient computation graph is built
INFO:blocks.main_loop:Entered the main loop
INFO:blocks.algorithms:Initializing the training algorithm
Starting training ...

INFO:blocks.algorithms:The training algorithm is initialized
-------------------------------------------------------------------------------
BEFORE FIRST EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     epoch_interrupt_received: False
     epoch_started: True
     epochs_done: 0
     iterations_done: 0
     received_first_batch: False
     training_started: True
Log records from the iteration 0:

ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run `on_error` extensions, potentially saving data, before exiting and reraising the error. Note that the usual `after_training` extensions will *not* be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.
Traceback (most recent call last):
  File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in <module>
    main_loop.run()
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run
    reraise_as(e)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/__init__.py", line 225, in reraise_as
    six.reraise(type(new_exc), new_exc, orig_exc_traceback)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run
    while self._run_epoch():
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch
    while self._run_iteration():
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration
    self.algorithm.process_batch(batch)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/__init__.py", line 234, in process_batch
    self._function(*ordered_batch)
  File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 610, in __call__
    storage_map=self.fn.storage_map)
  File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 599, in __call__
    outputs = self.fn()
TypeError: expected type_num 7 (NPY_INT64) got 12
Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi)
Inputs types: [TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(0,), (7,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Original exception:
    TypeError: expected type_num 7 (NPY_INT64) got 12
Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi)
Inputs types: [TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(0,), (7,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Can you add a few notes explaning what S, T, B, D, L, C and F are?

Maybe you could also explain the input format for apply(cls, y, y_hat, y_mask, y_hat_mask, scale='log_scale') ?

Is it correct that:

Where INPUT_SEQUENCE_LENGTH is the length of the input sequences (30 for the example data) and LABEL_LENGTH is the label sequence for each target. Is LABEL_LENGTH padded if the true label length vary?

-Søren

mpezeshki commented 9 years ago

Hi Soren,

ctc_test_data.pkl is a toy dataset containing S batches. Each batch contains B examples. And each example has a length of T and F features. The important thing about ctc_test_data.pkl that shows the functionality of CTC is that the length of input and output sequences are different. So according to what I said above, the output has S batches of B examples of length L (different from T).

Apparently you have a problem in casting. Try to run it with this flag: THEANO_FLAGS='floatX=float64' python something.py

On Sun, May 24, 2015 at 9:49 AM, Søren Kaae Sønderby < notifications@github.com> wrote:

I'm trying to run your ctc example but i get the following error:

Building model ... /Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility from scan_perform.scan_perform import * Bulding DataStream ... Bulding training process... INFO:blocks.algorithms:Taking the cost gradient INFO:blocks.algorithms:The cost gradient computation graph is built Starting training ... INFO:blocks.main_loop:Entered the main loop INFO:blocks.algorithms:Initializing the training algorithm INFO:blocks.algorithms:The training algorithm is initialized ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.


BEFORE FIRST EPOCH

Training status: batch_interrupt_received: False epoch_interrupt_received: False epoch_started: True epochs_done: 0 iterations_done: 0 received_first_batch: False training_started: True Log records from the iteration 0:

Traceback (most recent call last): File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in main_loop.run() File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run reraise_as(e) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/init.py", line 225, in reraise_as six.reraise(type(new_exc), new_exc, orig_exc_traceback) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run while self._run_epoch(): File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch while self._run_iteration(): File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration self.algorithm.process_batch(batch) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py", line 234, in process_batch self._function(*ordered_batch) File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 517, in call allow_downcast=s.allow_downcast) File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/tensor/type.py", line 130, in filter raise TypeError(err_msg, data) TypeError: ('Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py:224" at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]]\n\nOriginal exception:\n\tTypeError: Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py:224" at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this lo ss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]]', 'TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".', array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]]))

Which i think i can workaround by setting allow_input_downcast=True in line 224 in blocks/algorithms/init.py

But then i get another error:

Building model ... /Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility from scan_perform.scan_perform import * Bulding DataStream ... Bulding training process... INFO:blocks.algorithms:Taking the cost gradient INFO:blocks.algorithms:The cost gradient computation graph is built INFO:blocks.main_loop:Entered the main loop INFO:blocks.algorithms:Initializing the training algorithm Starting training ...

INFO:blocks.algorithms:The training algorithm is initialized

BEFORE FIRST EPOCH

Training status: batch_interrupt_received: False epoch_interrupt_received: False epoch_started: True epochs_done: 0 iterations_done: 0 received_first_batch: False training_started: True Log records from the iteration 0:

ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately. Traceback (most recent call last): File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in main_loop.run() File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run reraise_as(e) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/init.py", line 225, in reraise_as six.reraise(type(new_exc), new_exc, orig_exc_traceback) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run while self._run_epoch(): File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch while self._run_iteration(): File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration self.algorithm.process_batch(batch) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py", line 234, in process_batch self._function(*ordered_batch) File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 610, in call storage_map=self.fn.storage_map) File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 599, in call outputs = self.fn() TypeError: expected type_num 7 (NPY_INT64) got 12 Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi) Inputs types: [TensorType(int64, vector), TensorType(int64, vector)] Inputs shapes: [(0,), (7,)] Inputs strides: [(8,), (8,)] Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Original exception: TypeError: expected type_num 7 (NPY_INT64) got 12 Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi) Inputs types: [TensorType(int64, vector), TensorType(int64, vector)] Inputs shapes: [(0,), (7,)] Inputs strides: [(8,), (8,)] Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Can you add a few notes explaning what S, T, B, D, L, C and F are?

— Reply to this email directly or view it on GitHub https://github.com/mohammadpz/CTC-Connectionist-Temporal-Classification/issues/1 .

mpezeshki commented 9 years ago

y_mask: LABEL_LENGTH x BATCH_SIZE That's because the length of sequences in a batch may vary. In this case, sequences are padded with zero. y_mask_hat: INPUT_SEQUENCE_LENGTH x BATCH_SIZE

skaae commented 9 years ago

Thanks. Your code seems to run fine without blocks. I do have floatX=float32 but isn’t that necessary when you use GPU?

What is the license on the code? I’m planning to include a CTC example, using your code, in the theano lasagne library.

best regards Søren

On 26 May 2015, at 12:29, Mohammad Pezeshki notifications@github.com wrote:

Hi Soren,

ctc_test_data.pkl is a toy dataset containing S batches. Each batch contains B examples. And each example has a length of T and F features. The important thing about ctc_test_data.pkl that shows the functionality of CTC is that the length of input and output sequences are different. So according to what I said above, the output has S batches of B examples of length L (different from T).

Apparently you have a problem in casting. Try to run it with this flag: THEANO_FLAGS='floatX=float64' python something.py

On Sun, May 24, 2015 at 9:49 AM, Søren Kaae Sønderby < notifications@github.com> wrote:

I'm trying to run your ctc example but i get the following error:

Building model ... /Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility from scan_perform.scan_perform import * Bulding DataStream ... Bulding training process... INFO:blocks.algorithms:Taking the cost gradient INFO:blocks.algorithms:The cost gradient computation graph is built Starting training ... INFO:blocks.main_loop:Entered the main loop INFO:blocks.algorithms:Initializing the training algorithm INFO:blocks.algorithms:The training algorithm is initialized ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.


BEFORE FIRST EPOCH

Training status: batch_interrupt_received: False epoch_interrupt_received: False epoch_started: True epochs_done: 0 iterations_done: 0 received_first_batch: False training_started: True Log records from the iteration 0:

Traceback (most recent call last): File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in main_loop.run() File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run reraise_as(e) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/init.py", line 225, in reraise_as six.reraise(type(new_exc), new_exc, orig_exc_traceback) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run while self._run_epoch(): File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch while self._run_iteration(): File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration self.algorithm.process_batch(batch) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py", line 234, in process_batch self._function(*ordered_batch) File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 517, in call allow_downcast=s.allow_downcast) File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/tensor/type.py", line 130, in filter raise TypeError(err_msg, data) TypeError: ('Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py:224" at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]]\n\nOriginal exception:\n\tTypeError: Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py:224" at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this lo ss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]]', 'TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".', array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]]))

Which i think i can workaround by setting allow_input_downcast=True in line 224 in blocks/algorithms/init.py

But then i get another error:

Building model ... /Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility from scan_perform.scan_perform import * Bulding DataStream ... Bulding training process... INFO:blocks.algorithms:Taking the cost gradient INFO:blocks.algorithms:The cost gradient computation graph is built INFO:blocks.main_loop:Entered the main loop INFO:blocks.algorithms:Initializing the training algorithm Starting training ...

INFO:blocks.algorithms:The training algorithm is initialized

BEFORE FIRST EPOCH

Training status: batch_interrupt_received: False epoch_interrupt_received: False epoch_started: True epochs_done: 0 iterations_done: 0 received_first_batch: False training_started: True Log records from the iteration 0:

ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately. Traceback (most recent call last): File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in main_loop.run() File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run reraise_as(e) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/init.py", line 225, in reraise_as six.reraise(type(new_exc), new_exc, orig_exc_traceback) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run while self._run_epoch(): File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch while self._run_iteration(): File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration self.algorithm.process_batch(batch) File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py", line 234, in process_batch self._function(*ordered_batch) File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 610, in call storage_map=self.fn.storage_map) File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 599, in call outputs = self.fn() TypeError: expected type_num 7 (NPY_INT64) got 12 Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi) Inputs types: [TensorType(int64, vector), TensorType(int64, vector)] Inputs shapes: [(0,), (7,)] Inputs strides: [(8,), (8,)] Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Original exception: TypeError: expected type_num 7 (NPY_INT64) got 12 Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi) Inputs types: [TensorType(int64, vector), TensorType(int64, vector)] Inputs shapes: [(0,), (7,)] Inputs strides: [(8,), (8,)] Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Can you add a few notes explaning what S, T, B, D, L, C and F are?

— Reply to this email directly or view it on GitHub https://github.com/mohammadpz/CTC-Connectionist-Temporal-Classification/issues/1 .

— Reply to this email directly or view it on GitHub.

mpezeshki commented 9 years ago

Happy to hear that you want to use it in lasagne. The initial code that I started from is written be Rakesh Var and Shawn Tan. Although my code is pretty much different from theirs, but you should send them an email and ask them as well. Since Rakesh's code has Apache license, I also added it to my repo. Something important to notice is that the current version of code outputs NAN for very long sequences (the code may have other bugs too!). We have solved this problem in another private repo but it's not clean yet. But eventually, I'll update this repo as well.

Good luck,

skaae commented 9 years ago

Good to hear. I'll probably see if i can reproduce Alex Graves handwritten digit recognition results.

I have to admit that I havent looked closely at the implementation yet, but i'll do that in the coming days.

Out of curiosity, have you tested your private repo code on some "real" datasets? Would you be willing to put up unclean code? I can clean it up in a PR to this repo then?

I'll of course attribute you, Rakesh Var, Shawn Tan and other people who contributed.

Your help is appreciated :)

mpezeshki commented 9 years ago

@skaae , the new changes to code is made by Phil Brakel. I'll put his version in another branch of current repo tonight (Montreal time). So please attribute him as well.

skaae commented 9 years ago

Great! Thanks

skaae commented 9 years ago

Hi thanks for sharing. I started to work my way through the CTC and came across some differences between the formulation in

http://www.machinelearning.org/proceedings/icml2006/047_Connectionist_Tempor.pdf (your reference)

and in Alex Graves' book: http://www.cs.toronto.edu/~graves/preprint.pdf

The differences are in the initial states of the backward pass. In the paper, eq. 9, they are specified as the probablity of blank and correct label.

But in the book eq. 7.13 specifies them as 1. From the definition of the beta values i believe that 1 is the correct value?

I haven't fully understood how you define the recursion with a matrix, but given you calculate the backward pass as the reverse of the forward pass i don't believe that they handle the inital states differently?

skaae commented 9 years ago

additionally equation 10 in the paper uses yt while eq. 7.15 in the book uses y{t+1}?

mpezeshki commented 9 years ago

@skaae , my pleasure :+1: Maybe I'm wrong, but I think the forward pass is enough. So my implementation is a bit different.

skaae commented 9 years ago

pseudo_cost use both log_alpha and log_beta when calculating marginals. That is done in the get_targets function?

Do you train with pseudo cost or cost function?

Also from eq 15 in http://www.cs.toronto.edu/~graves/icml_2006.pdf it seems that you need both alpha and beta?

mpezeshki commented 9 years ago

I see. There are new changes in the recent version that I don't know well about. You may ask @pbrakel .

skaae commented 9 years ago

I’m working on some tests for the forward and backward matrices if you are interested?.

I just need to figure out the initial states for beta which i’m fairly sure should be 1 and not y probs

On 29 May 2015, at 15:23, Mohammad Pezeshki notifications@github.com wrote:

I see. There are new changes in the recent version that I don't know well about. You may ask @pbrakel .

— Reply to this email directly or view it on GitHub.

pbrakel commented 9 years ago

Hey @skaae,

More tests are always nice and if you find bugs please let us know!

First of all, not all the functions in my version of the code might be correct anymore because I just focused on the higher level log domain ones. The tests are messy as well.

We train using the pseudo cost function because for some reason the gradient of the normal cost function is unstable. The pseudo cost simply computes the CTC gradient directly without using automated differentiation. To turn this gradient into a cost that can be used for automated differentiation through the rest of your model, I either use the cross entropy between the output of your model and the CTC targets (i.e., label probabilities after summing over all the paths that are compatible with the target sequence) or the sum of the element wise product of the gradient with respect to the softmax inputs and the pre-softmax activation of your model. The latter variant is more stable because it skips the softmax gradient and prevents the computation of targets / predictions that can lead to divisions by zeros. For the standard cost you only need the forward pass but for the manual computation of the gradient you need the backward pass as well.

For ease of implementation, I simply computed beta in exactly the same way as alpha (except for some mask related issues). This is not the same as in some formulations of the algorithm where beta(t) doesn't include a multiplication with the local softmax output y_hat(t). This is why in the thesis the likelihood is defined as sum_u(alpha(t, u)beta(t, u)) while in the paper it's sum_u(alpha(t, u)beta(t, u)/y_hat(t, u)). Hopefully this clarifies things a bit.

Cheers

skaae commented 9 years ago

Thanks for the reply. I have a few more questions.

"This is not the same as in some formulations of the algorithm where beta(t) doesn't include a multiplication with the local softmax output y_hat(t)."

do you then refer to the different initial states in the book and in the paper. I see that equation 7.26 in the book and eqation 14 in the paper differ by only the division with y^t_{l_s} ?

I dont follow your description on how to use the pseudo_cost for training.

From what you write i should use skip_softmax=True? and then have a linear output from my model?

In the docs for pseudo_cost you write that

    y_hat : tensor3 (T, B, C)
        class probabily distribution sequences, potentially in log domain

Does that mean that y_hat could be in log domain or should be in log domain?

Secondly I have no clue what you mean with the line :)

"...or the sum of the element wise product of the gradient with respect to the softmax inputs and the pre-softmax activation of your model."

Could you give an example?

say i have the following:

model_pre_act = #model_output_including_blanks
model_softmax = softmax(model_pre_act)

How would i then get the gradients for the parameters in the model?

skaae commented 9 years ago

I tried to write an example using lasagne. Its mostly copied from the ctc_test file.

I try to do what you described here:

...the sum of the element wise product of the gradient with respect to the softmax inputs and the pre-softmax activation of your model.

I'm not sure I correctly understood how to combine the CTC gradients and the gradients from the rest of the network.

import lasagne
from lasagne.layers import RecurrentLayer, InputLayer, DenseLayer,\
    NonlinearityLayer, ReshapeLayer, EmbeddingLayer
import theano
import theano.tensor as T
import numpy as np
num_batch, input_seq_len = 10, 45
num_classes = 10
target_seq_len = 5

Y_hat = np.asarray(np.random.normal(
    0, 1, (input_seq_len, num_batch, num_classes + 1)), dtype=floatX)
Y = np.zeros((target_seq_len, num_batch), dtype='int64')
Y[25:, :] = 1
Y_hat_mask = np.ones((input_seq_len, num_batch), dtype=floatX)
Y_hat_mask[-5:] = 0
# default blank symbol is the highest class index (3 in this case)
Y_mask = np.asarray(np.ones_like(Y), dtype=floatX)
X = np.random.random(
    (num_batch, input_seq_len)).astype('int32')

input_mask = T.matrix('features_mask')
y_hat_mask = input_mask
y = T.lmatrix('phonemes')
y_mask = T.matrix('phonemes_mask')
x = T.imatrix()   # batchsize, input_seq_len

# setup Lasagne Recurrent network
# The output from the network is:
#  a) output_lin_ctc is the activation before softmax  (input_seq_len, batch_size, num_classes + 1)
#  b) ouput_softmax is the output after softmax  (batch_size, input_seq_len, num_classes + 1)
l_inp = InputLayer((num_batch, input_seq_len))
l_emb = EmbeddingLayer(l_inp, input_size=num_classes, output_size=15)
l_rnn = RecurrentLayer(l_emb, num_units=10)
l_rnn_shp = ReshapeLayer(l_rnn, (num_batch*input_seq_len, 10))
l_out = DenseLayer(l_rnn_shp, num_units=num_classes+1,
                   nonlinearity=lasagne.nonlinearities.identity)  # + blank

l_out_shp = ReshapeLayer(l_out, (num_batch, input_seq_len, num_classes+1))

# dimshuffle to shape format (input_seq_len, batch_size, num_classes + 1)
l_out_shp_ctc = lasagne.layers.DimshuffleLayer(l_out_shp, (1, 0, 2))

l_out_softmax = NonlinearityLayer(
    l_out, nonlinearity=lasagne.nonlinearities.softmax)
l_out_softmax_shp = ReshapeLayer(
    l_out_softmax, (num_batch, input_seq_len, num_classes+1))

output_lin_ctc = lasagne.layers.get_output(l_out_shp_ctc, x)
output_softmax = lasagne.layers.get_output(l_out_softmax_shp, x)
all_params = lasagne.layers.get_all_params(l_out_shp)

###############
#  GRADIENTS  #
###############

# the CTC cross entropy between y and linear output network
pseudo_cost = ctc_cost.pseudo_cost(
    y, output_lin_ctc, y_mask, y_hat_mask,
    skip_softmax=True)

# calculate the gradients of the CTC wrt. linar output of network
pseudo_cost_sum = pseudo_cost.sum()
pseudo_cost_grad = T.grad(pseudo_cost_sum, output_lin_ctc)

# multiply CTC gradients with RNN output activation before softmax
output_to_grad = T.sum(pseudo_cost_grad * output_lin_ctc)

# calculate the gradients
all_grads = T.grad(output_to_grad, all_params)

updates = lasagne.updates.rmsprop(all_grads, all_params, learning_rate=0.0001)

train = theano.function([x, y, y_hat_mask, y_mask],
                        [output_lin_ctc, output_softmax, pseudo_cost_sum],
                        updates=updates)

test_val = train(X, Y, Y_hat_mask, Y_mask)
print test_val[0].shape
print test_val[1].shape

# Create test dataset
num_samples = 1000
np.random.seed(1234)

# create simple dataset of format
# input [5,5,5,5,5,2,2,2,2,2,3,3,3,3,3,....,1,1,1,1]
# targets [5,2,3,...,1]
# etc...
input_lst, output_lst = [], []
for i in range(num_samples):
    this_input = []
    this_output = []
    prev_class = -1
    for j in range(target_seq_len):
        this_class = np.random.randint(num_classes)
        while prev_class == this_class:
            this_class = np.random.randint(num_classes)

        prev_class = this_class
        this_class = np.random.randint(num_classes)
        this_len = np.random.randint(1, 10)

        this_input += [this_class]*this_len
        this_output += [this_class]

    this_input += (input_seq_len - len(this_input))*[this_input[-1]]

    input_lst.append(this_input)
    output_lst.append(this_output)

input_arr = np.concatenate([input_lst]).astype('int32')
y_arr = np.concatenate([output_lst]).astype('int64')

y_mask_arr = np.ones((target_seq_len, num_batch), dtype='float32')
input_mask_arr = np.ones((input_seq_len, num_batch), dtype='float32')

for nn in range(200):
    for i in range(num_samples//num_batch):
        idx = range(i*num_batch, (i+1)*num_batch)
        _, _, cost = train(
            input_arr[idx],
            np.transpose(y_arr[idx]),
            input_mask_arr,
            y_mask_arr)
        print cost
pbrakel commented 9 years ago

Hey Søren,

While the pseudo cost is not the same as the CTC cost, it should have the same gradient and already does the multiplication with the outputs internally so you don't have to compute the gradient with respect to the outputs separately and can just treat it as you would with any other cost. You can use the actual CTC cost function for performance monitoring. When you use the skip_softmax option, the function expects the linear activations. I see you implemented this correctly. Internally, it still computes softmax, but it makes sure theano doesn't try to compute its gradient. The skip_softmax variant should be far more reliable because it can deal with very large input values and I'm guessing it might be a bit faster too but I didn't test that.

I'll try to answer your earlier questions when I find more time.

Best, Philemon

skaae commented 9 years ago

Thanks. I think i'm getting there.

Changed these lines and printed cost instead of pseudo cost.

    pseudo_cost_grad = T.grad(pseudo_cost.mean(), all_params)
    true_cost = ctc_cost.cost(y, output_softmax.dimshuffle(1, 0, 2), y_mask, y_hat_mask)
    cost = T.mean(true_cost)
    updates = lasagne.updates.rmsprop(pseudo_cost_grad, all_params, learning_rate=0.0001)

The cost seems to go down on my test data.

Richi91 commented 9 years ago

Hello Søren,

Did you get the CTC code working with lasagne (recurrent)? Could you share that code? It would save me a lot of time ;-) In the code snippet above, you use "pseudo_cost", however this function does not exist anymore in the most recent version... Maybe you could just post the code that was finally working for your example?

Cheers, Richard

skaae commented 9 years ago

I did set it up, but i didnt get around to test it on timit. I can share my code on monday, please email me if i forget :) on what dataset do you plan to use it?

Richi91 commented 9 years ago

Cool, thank you! I am also going to use it for timit.. I try to reproduce Alex Graves results

skaae commented 9 years ago

Awesome, I'm very interested in the results. Do you have a script for creating the input features? I have a simple python script which I think reproduces the features.

Den 27/06/2015 23.46 skrev "Richi91" notifications@github.com:

Cool, thank you! I am also going to use it for timit.. I try to reproduce Alex Graves results

— Reply to this email directly or view it on GitHub.det

Richi91 commented 9 years ago

Well, he does not completely specify his preprocessing in his paper. He uses HTK for his "fourier-based filter banks", which is explained here: http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/node54.html But there are several parameters which are not explained in the paper. For example the frequency range, the analysis window length, the step-size and the window size for calculating the deltas.

For a first try, I am using the complete frequency range from 200-8kHz. For the other parameters I just use the standard values (25ms,10ms, 2). I have just pushed my pre-process script to my fork of 'craffel/nntools'. I use the package python_speech_features for calculating the filterbank energies. If you like to discuss about the preprocessing, I suggest we open another "issue" or discuss via email, because this is still the CTC - "issue" ;-)

skaae commented 9 years ago

I put up the code here: https://github.com/skaae/Lasagne-CTC. Im very interested in your progress :)

Richi91 commented 8 years ago

Hello @pbrakel ,

I am still/again working with your CTC code, but I cannot get it working correctly. During training, I get both positive and negative values for the cost. This shouldn't be possible, should it?

Training my net with cross-entropy error (at each timestep) worked fine, thus the problem must be the CTC-cost. Did you test the CTC-code and verified that it is correct? And do you have an example for how to use it? I have problems understanding the pseudo_cost code, so I can't tell whether there is something wrong or I just don't get it..

Kind regards

pbrakel commented 8 years ago

Hey @Richi91,

I just wrote an explanation of what pseudo_cost is supposed to do at https://github.com/skaae/Lasagne-CTC/issues/1#issuecomment-131961336 . What it boils down to is that the pseudo_cost should have the same gradient as CTC but that it will not give the same cost value and can be negative. Ideally I should write a theano op that computes the cost using the cost function and the gradient using the code in pseudo_cost to move the confusing part of the code to a lower level but I haven't gotten around to doing so yet.

If you show me an example of your code I can look at it. Perhaps these couple of lines will be helpful as well (y_hat_o is the output activation before it goes into sofmax):

ctc_cost_t = ctc_cost.pseudo_cost(y, y_hat_o, y_mask, y_hat_mask,
                                                       skip_softmax=True)
ctc_cost_monitor = ctc_cost.cost(y, y_hat, y_mask, y_hat_mask)
Richi91 commented 8 years ago

Hey @pbrakel ,

thank you for your answer, it helped me to understand the need for pseudo_cost. I guess that I have already used it correctly, however I still cannot achieve any good results and cannot tell whether this is due to the CTC implementation or due to other reasons. For some reason, my network only learns to maximize the probability for blank symbols (>0.8 at almost every timestep). Tonight I will try to pretrain with cross-entropy and fine-tune with ctc and see whether this helps.

Here is a snippet of my code (using lasagne):

#************************************ input *************************************************
l_in = lasagne.layers.InputLayer(shape=(BATCH_SIZE, MAX_INPUT_SEQ_LEN, INPUT_DIM))
l_mask = lasagne.layers.InputLayer(shape=(BATCH_SIZE, MAX_INPUT_SEQ_LEN), 
                                   input_var=theano.tensor.matrix('input_mask', dtype=theano.config.floatX))
#************************************ deep BLSTM ********************************************
blstm0 = BLSTMConcatLayer(incoming=l_in, mask_input=l_mask, 
    num_units=N_LSTM_HIDDEN_UNITS[0], gradient_steps=GRADIENT_STEPS, grad_clipping=GRAD_CLIP)
blstm1 = BLSTMConcatLayer(incoming=blstm0, mask_input=l_mask,
    num_units=N_LSTM_HIDDEN_UNITS[1], gradient_steps=GRADIENT_STEPS, grad_clipping=GRAD_CLIP)
blstm2 = BLSTMConcatLayer(incoming=blstm1, mask_input=l_mask, 
    num_units=N_LSTM_HIDDEN_UNITS[2], gradient_steps=GRADIENT_STEPS, grad_clipping=GRAD_CLIP)

#************************************ fully connected ****************************************                          
l_reshape2 = lasagne.layers.ReshapeLayer(
    blstm2, (BATCH_SIZE*MAX_INPUT_SEQ_LEN, N_LSTM_HIDDEN_UNITS[2]*2))
l_out_lin = lasagne.layers.DenseLayer(
    incoming=l_reshape2, num_units=OUTPUT_DIM, nonlinearity=lasagne.nonlinearities.linear)

#************************************ linear output ******************************************
model_lin = lasagne.layers.ReshapeLayer(
    l_out_lin, (BATCH_SIZE, MAX_INPUT_SEQ_LEN, OUTPUT_DIM))

#************************************ Softmax output *****************************************
l_out_softmax = lasagne.layers.NonlinearityLayer(
    l_out_lin, nonlinearity=lasagne.nonlinearities.softmax)
model_soft = lasagne.layers.ReshapeLayer(
    l_out_softmax, (BATCH_SIZE, MAX_INPUT_SEQ_LEN, OUTPUT_DIM))  

output_lin = lasagne.layers.get_output(model_lin) 
output_softmax = lasagne.layers.get_output(model_soft) 

Y = T.matrix('target', dtype=theano.config.floatX) 
Y_mask = T.matrix('target_mask', dtype=theano.config.floatX) 

all_params = lasagne.layers.get_all_params(model_lin, trainable=True) 

# Lasagne = Batch x Time x Feature_Dim --> swap Batch and Time for CTC
ctc_cost_train = ctc_cost.pseudo_cost(y=Y.dimshuffle((1,0)), \
                       y_hat=output_lin.dimshuffle((1,0,2)), \
                       y_mask=Y_mask.dimshuffle((1,0)), \
                       y_hat_mask=(l_mask.input_var).dimshuffle((1,0)), \
                       skip_softmax=True).mean(dtype=theano.config.floatX)

ctc_cost_monitor = ctc_cost.cost(y=Y.dimshuffle((1,0)), \
                            y_hat=output_softmax.dimshuffle((1,0,2)), \
                            y_mask=Y_mask.dimshuffle((1,0)), \
                            y_hat_mask=(l_mask.input_var).dimshuffle((1,0))).mean(dtype=theano.config.floatX)                                

updates = lasagne.updates.momentum(
    ctc_cost_train, all_params, learning_rate=lasagne.utils.floatX(LEARNING_RATE))  

train = theano.function([l_in.input_var, Y, l_mask.input_var, Y_mask],
                        outputs=[output_softmax, ctc_cost_monitor],
                        updates=updates)
Michlong commented 8 years ago

@Richi91 hi, I am meeting the same problem as yours... ctc loss is negative and the train results output all blanks. Do you figure out these problems?

Richi91 commented 8 years ago

@Michlong hi, sorry for late reply. The implementation of CTC does work fine. Use pseudo-cost for training and cost for display. I can't remember what went wrong, possibly I had wrong hyper-parameters. I would suggest you to use a rel. high momentum and start with a high learning rate. Or use an adaptive learning rate, but start with a high LR. The first initial epochs, the net will mostly output blanks. After a few epochs you should see that the net will produce other outputs aswell. Most will be blanks, though.

Don't forget gradient clipping, especially with high LR ;-)

raindeer commented 8 years ago

@Richi91 are you willing to share your code?

Richi91 commented 8 years ago

I do not longer have an implementation for an RNN with CTC in Lasagne. But in https://github.com/Richi91/SpeechRecognition/blob/master/blocks/run.py I have done some experiments using blocks.

Actually, all you need to do is use the cost function like this: cost_train = ctc.pseudo_cost(y, y_hat, y_m, x_m).mean() cost_monitor = ctc.cost(y, y_hat_softmax, y_m, x_m).mean()

then write a theano function for the train loop with cost_train and a function for validation without updates with cost_monitor.

y: targets (e.g. words or phonemes. This is not frame-wise) y_hat, y_hat_softmax: network output before and after softmax y_m: mask for targets x_m: mask for inputs

Best regards