mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.36k stars 3.97k forks source link

Assign requires shapes of both tensors to match. lhs shape= [41] rhs shape= [29] #2081

Closed waynetrx closed 5 years ago

waynetrx commented 5 years ago

For support and discussions, please use our Discourse forums.

If you've found a bug, or have a feature request, then please create an issue with the following information:

I have tried to fine-tune from the 0.4.1 pre-trained models using the deepspeech-0.4.1-checkpoint.tar.gz file for the checkpoint folder. But i keep having the following error despite meeting all pre-requisites.

python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir /training_data/deepspeech-0.4.1-checkpoint --epochs -2 --train_files /training_data/sample_train.csv --dev_files /training_data/sample_dev.csv --test_files /training_data/sample_test.csv --train_batch_size 24 --dev_batch_size 24 --test_batch_size 24 --learning_rate 0.0001 --display_step 0 --validation_step 1 --dropout_rate 0.15 --checkpoint_step 1 --lm_alpha 0.75 --lm_beta 1.85
Preprocessing ['/training_data/sample_train.csv']
Preprocessing done
Preprocessing ['/training_data/sample_dev.csv']
Preprocessing done
Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [41] rhs shape= [29]
[[node save_1/Assign_15 (defined at DeepSpeech.py:544)  = Assign[T=DT_FLOAT, _class=["loc:@b6"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](b6, save_1/RestoreV2_1/_29)]]

Caused by op 'save_1/Assign_15', defined at:
File "DeepSpeech.py", line 941, in <module>
tf.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "DeepSpeech.py", line 893, in main
train()
File "DeepSpeech.py", line 544, in train
config=Config.session_config) as session:
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 921, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 643, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1107, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 557, in create_session
self._scaffold.finalize()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 213, in finalize
self._saver = training_saver._get_saver_or_default()  # pylint: disable=protected-access
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 886, in _get_saver_or_default
saver = Saver(sharded=True, allow_empty=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1102, in __init__
self.build()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 789, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 459, in _AddShardedRestoreOps
name="restore_shard"))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 119, in restore
self.op.get_shape().is_fully_defined())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/state_ops.py", line 221, in assign
validate_shape=validate_shape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
use_locking=use_locking, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

E Assign requires shapes of both tensors to match. lhs shape= [41] rhs shape= [29]
[[node save_1/Assign_15 (defined at DeepSpeech.py:544)  = Assign[T=DT_FLOAT, _class=["loc:@b6"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](b6, save_1/RestoreV2_1/_29)]]

The checkpoint in /training_data/deepspeech-0.4.1-checkpoint does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /training_data/deepspeech-0.4.1-checkpoint.
root@Maeve:/DeepSpeech# python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir /training_data/deepspeech-0.4.1-checkpoint --epoch -2 --train_files /training_data/sample_train.csv --dev_files /training_data/sample_dev.csv --test_files /training_data/sample_test.csv --train_batch_size 24 --dev_batch_size 24 --test_batch_size 24 --learning_rate 0.0001 --display_step 0 --validation_step 1 --dropout_rate 0.15 --checkpoint_step 1 --lm_alpha 0.75 --lm_beta 1.85
Preprocessing ['/training_data/sample_train.csv']
Preprocessing done
Preprocessing ['/training_data/sample_dev.csv']
Preprocessing done
Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [41] rhs shape= [29]
[[node save_1/Assign_15 (defined at DeepSpeech.py:544)  = Assign[T=DT_FLOAT, _class=["loc:@b6"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](b6/Adam_1, save_1/RestoreV2_1/_33)]]

Caused by op 'save_1/Assign_15', defined at:
File "DeepSpeech.py", line 941, in <module>
tf.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "DeepSpeech.py", line 893, in main
train()
File "DeepSpeech.py", line 544, in train
config=Config.session_config) as session:
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 921, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 643, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1107, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 557, in create_session
self._scaffold.finalize()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 213, in finalize
self._saver = training_saver._get_saver_or_default()  # pylint: disable=protected-access
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 886, in _get_saver_or_default
saver = Saver(sharded=True, allow_empty=True)
 File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1102, in __init__
self.build()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 789, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 459, in _AddShardedRestoreOps
name="restore_shard"))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 119, in restore
self.op.get_shape().is_fully_defined())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/state_ops.py", line 221, in assign
validate_shape=validate_shape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
use_locking=use_locking, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [41] rhs shape= [29]
[[node save_1/Assign_15 (defined at DeepSpeech.py:544)  = Assign[T=DT_FLOAT, _class=["loc:@b6"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](b6/Adam_1, save_1/RestoreV2_1/_33)]] 
The checkpoint in /training_data/deepspeech-0.4.1-checkpoint does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /training_data/deepspeech-0.4.1-checkpoint.

These are the alphabets collected by using the util/check_characters.py for train.csv, test.csv and dev.csv

['/training_data/train.csv']
### The following unique characters were found in your transcripts: ###
['j', 'd', 't', 'h', 'u', 'i', 'l', 'f', 'g', 'q', 'k', 'v', 'c', 'w', 's', 'z', 'm', 'n', 'o', 'a', 'b', 'y', 'x', "'", ' ', 'p', 'r', 'e']
### All these characters should be in your data/alphabet.txt file ###
['/training_data/test.csv']
### The following unique characters were found in your transcripts: ###
['x', 'w', 'b', 'm', 'u', 'v', 'g', 'r', 'c', 'f', 'h', 'a', 'o', 't', 'p', "'", 'e', 'd', 's', 'z', 'q', 'l', 'i', ' ', 'y', 'n', 'j', 'k']
### All these characters should be in your data/alphabet.txt file ###
['/training_data/dev.csv']
### The following unique characters were found in your transcripts: ###
['g', 'c', 'o', 'm', 'l', "'", 'f', 'd', 'x', 't', 's', 'i', 'q', 'b', 'w', 'v', 'p', 'j', ' ', 'k', 'a', 'u', 'e', 'z', 'n', 'r', 'y', 'h']
### All these characters should be in your data/alphabet.txt file ###
lissyx commented 5 years ago

@waynetrx You can't continue training with changing alphabets.

waynetrx commented 5 years ago

@waynetrx You can't continue training with changing alphabets.

You are a life-saver! I accidentally modify my data/alphabet.txt and added digits into it. After replacing with the original alphabet.txt, it works!

I didn't realise that Deepspeech actually uses data/alphabet.txt

Is it normal to fine-tune from epoch 225336?

kdavis-mozilla commented 5 years ago

As the core question is answered and this is evolving into a discussion I suggest it move to our Discourse forums.

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.