mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.23k stars 3.95k forks source link

ctc_loss exception thrown when using TED-LIUM release 1 corpus #433

Closed tongda closed 7 years ago

tongda commented 7 years ago

I changed the corpus from TED-LIUM release 2 to release 1. When the training has just started, I got exception like this:

+ export ds_importer=ted
+ ds_importer=ted
+ export ds_train_batch_size=16
+ ds_train_batch_size=16
+ export ds_dev_batch_size=8
+ ds_dev_batch_size=8
+ export ds_test_batch_size=8
+ ds_test_batch_size=8
+ export ds_learning_rate=0.0001
+ ds_learning_rate=0.0001
+ export ds_validation_step=20
+ ds_validation_step=20
+ export ds_epochs=150
+ ds_epochs=150
+ export ds_display_step=10
+ ds_display_step=10
+ export ds_checkpoint_step=1
+ ds_checkpoint_step=1
+ '[' '!' -f DeepSpeech.py ']'
+ python -u DeepSpeech.py
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
STARTING Optimization

STARTING Epoch 0000
Training model...
I tensorflow/core/kernels/logging_ops.cc:79] [[8 5 -56]...][16 59]
I tensorflow/core/kernels/logging_ops.cc:79] [[0 0][0 1][0 2][0 3][0 4][0 5][0 6][0 7][0 8][0 9]...]
Traceback (most recent call last):
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/Users/dtong/.pyenv/versions/3.6.0/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: label SparseTensor is not valid: indices[6] = [0,6] is out of bounds: need 0 <= index < [16,6]
     [[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/Reshape_7, tower_0/ToInt64, tower_0/Gather, tower_0/padding_fifo_queue_DequeueMany:1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 1133, in <module>
    last_train_wer, last_dev_wer, hibernation_path = train()
  File "DeepSpeech.py", line 1074, in train
    result = calculate_loss_and_report(train_context, session, epoch=epoch, query_report=is_display_step)
  File "DeepSpeech.py", line 942, in calculate_loss_and_report
    result = session.run(params, **extra_params)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)

  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: label SparseTensor is not valid: indices[6] = [0,6] is out of bounds: need 0 <= index < [16,6]
     [[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/Reshape_7, tower_0/ToInt64, tower_0/Gather, tower_0/padding_fifo_queue_DequeueMany:1)]]

Caused by op 'tower_0/CTCLoss', defined at:
  File "DeepSpeech.py", line 1133, in <module>
    last_train_wer, last_dev_wer, hibernation_path = train()
  File "DeepSpeech.py", line 1032, in train
    train_context = create_execution_context('train')
  File "DeepSpeech.py", line 783, in create_execution_context
    tower_results = get_tower_results(data_set, optimizer=optimizer)
  File "DeepSpeech.py", line 462, in get_tower_results
    calculate_accuracy_and_loss(batch_set, no_dropout if optimizer is None else dropout_rates)
  File "DeepSpeech.py", line 344, in calculate_accuracy_and_loss
    total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/ops/ctc_ops.py", line 145, in ctc_loss
    ctc_merge_repeated=ctc_merge_repeated)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 164, in _ctc_loss
    name=name)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/dtong/.virtualenvs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): label SparseTensor is not valid: indices[6] = [0,6] is out of bounds: need 0 <= index < [16,6]
     [[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/Reshape_7, tower_0/ToInt64, tower_0/Gather, tower_0/padding_fifo_queue_DequeueMany:1)]]

As I can see, the batch shape is [16, 59], but the exception said the index should be less than [16, 6].

I am not familiar with ctc_loss, can anyone explain a little about what happened here for me, please?

BTW, The code can run properly on TED-LIUM release 2 corpus.

kdavis-mozilla commented 7 years ago

TED-LIUM release 1 is not currently supported

kdavis-mozilla commented 7 years ago

If you want to file an enhancement issue or file and enhancement issue and make the associated pull request, both would be more than welcome. (Creating such a pull request likely will not be too difficult as one can start with the current ted importer then make the required changes.)

tongda commented 7 years ago

@kdavis-mozilla Thanks. I am working on making DeepSpeech compatible with TED-LIUM release 1. I thought they were same structure, while it turned out they are not. Do you have any clue of what is the difference between release 1 and 2, please?

kdavis-mozilla commented 7 years ago

@tongda Actually I don't know how the structures differ.

When I was originally working on the importer I only looked at the TED 2 data set. I assume, though it may not be the case, that they are not too different.

ApexNDSU commented 7 years ago

Hello @tongda, Actually I am getting the same issue while trying to run DeepSpeech files with ted-lium release1 dataset. Did you get a solution by any chance on this? Thank you.

tongda commented 7 years ago

@ApexNDSU No progress. I just change to release 2 dataset because lack of time. But I hope to look into it when I am free.

antho-rousseau commented 7 years ago

Hi there, author of TEDLIUM 1&2 here. TBH, I don't really see the point in wanting to use only release 1 when you have release 2 which contains everything release 1 has to offer with even more talks in it. And to complete the answer, there is no real structural difference between releases 1&2, besides more talks, so IMHO the root cause of your issue lies elsewhere. The only difference maybe is the lack of fillers in release 2 compared to release 1, I don't remember exactly. BTW, if you want, you can extract the release 1 talks from release 2 and give it a try, since release 2 works with this repo. As per your issue, in the example you pasted, for me it smells like "n_targets > n_frames".

ApexNDSU commented 7 years ago

Thank you both for your reply. For me the reason to prefer release 1 was because of its smaller size. I was getting OOM error with release 2 and reducing the batch size will take even longer time to train. I will try your option @antho-rousseau. Thank you!

kdavis-mozilla commented 7 years ago

@ApexNDSU The OOM can be removed by using a smaller batch size. The OOM is independent of the size of the corpus, TED v1 vs TED v2.

Training time is dependent upon corpus size. However, you can use the command line parameters[1]

to use only a subset of TED v2

ApexNDSU commented 7 years ago

Yes,I can limit the corpus size. Thank you so much @kdavis-mozilla Just a quick concern. If I do so, i won't have a WER to compare and check with right? Also may I ask the WER for Ted v2 that you have got?

kdavis-mozilla commented 7 years ago

@ApexNDSU In limiting the corpus size you can compute WER as before.

However, WER comparisons with other full TED v2 runs will unfortunately be an apples vs oranges comparison.

We've done preliminary TED v2 WER work training on the full TED v2 training data set and got WER's in the mid 20%. But we didn't take much time to tune our language model, which needs work, or to use multiple decoder results.

ApexNDSU commented 7 years ago

Agree. Thank you so much for the clarity @kdavis-mozilla

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.