Closed JRMeyer closed 3 years ago
Hi, thanks for showing interest! The alphabet in the release should be correct, but of course there might have been an error. It is supposed to contain the characters a-z as well as the Swedish characters å ä ö, and a whitespace.
What are two different numbers you're seeing? Just as a sanity check 😊
Hi @hsson, thanks for the reply!
I'm trying to export the model checkpoints from this repo into a new output_graph.pb
using the newest training Docker image from coqui-ai STT (a continuation of mozilla's deepspeech project). The original checkpoints are from mozilla's v0.6.1
and the new Coqui Docker image is on main
.
Steps to reproduce:
model-checkpoint.tar.gz
and alphabet.txt
from [se-asr/model/releases]/code/train.py --checkpoint_dir checkpoints/ --export_dir checkpoints/ --alphabet_config_path alphabet.txt
After downloading data and setting up the Docker container, here's what happens:
root@3678f66a06ff:/mnt/deepspeech-swedish# wc -l alphabet.txt
36 alphabet.txt
root@3678f66a06ff:/mnt/deepspeech-swedish# ls checkpoints/
author_model_0.0.1.md best_dev-286629.index checkpoint
best_dev-286629.data-00000-of-00001 best_dev-286629.meta
root@3678f66a06ff:/mnt/deepspeech-swedish# /code/train.py --checkpoint_dir checkpoints/ --export_dir checkpoints/ --alphabet_config_path alphabet.txt
I Exporting the model...
I Could not find best validating checkpoint.
I Loading most recent checkpoint from checkpoints/best_dev-286629
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
Traceback (most recent call last):
File "/code/train.py", line 12, in <module>
ds_train.run_script()
File "/code/training/coqui_stt_training/train.py", line 986, in run_script
absl.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/code/training/coqui_stt_training/train.py", line 966, in main
export()
File "/code/training/coqui_stt_training/train.py", line 815, in export
load_graph_for_evaluation(session)
File "/code/training/coqui_stt_training/util/checkpoints.py", line 151, in load_graph_for_evaluation
_load_or_init_impl(session, methods, allow_drop_layers=False)
File "/code/training/coqui_stt_training/util/checkpoints.py", line 106, in _load_or_init_impl
return _load_checkpoint(session, ckpt_path, allow_drop_layers, allow_lr_init=allow_lr_init)
File "/code/training/coqui_stt_training/util/checkpoints.py", line 71, in _load_checkpoint
v.load(ckpt.get_tensor(v.op.name), session=session)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
session.run(self.initializer, {self.initializer.inputs[1]: value})
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1156, in _run
(np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(31,)'
You see there's an off-by-one error:
ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(31,)'
and the alphabet file only has 31 unique symbols:
root@3678f66a06ff:/mnt/deepspeech-swedish# cat alphabet.txt | grep -v "#" | wc -l
31
Given that the official [Mozilla alphabet] contains an apostrophe '
, I thought the original Swedish alphabet.txt may have included it as well, so I included the apostrophe, and the checkpoints can be successfully exported:
root@3678f66a06ff:/mnt/deepspeech-swedish# python /code/train.py --checkpoint_dir checkpoints/ --export_dir checkpoints/ --alphabet_config_path alphabet.txt-new
I Exporting the model...
I Could not find best validating checkpoint.
I Loading most recent checkpoint from checkpoints/best_dev-286629
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
I Models exported at checkpoints/
I Model metadata file saved to checkpoints/author_model_0.0.1.md. Before submitting the exported model for publishing make sure all information in the metadata file is correct, and complete the URL fields.
The problem is, I'm not sure if there actually was a '
in the original alphabet, and even if there was, I don't know where it goes in the file.
When I try out the exported model on some clear, basic Swedish, the model doesn't perform well:
(stt-venv) [josh@macbook stt-community-models]$ stt --model deepspeech-swedish/checkpoints/output_graph.pb --audio sample-audio/tack-sa-mycket.wav
Loading model from file deepspeech-swedish/checkpoints/output_graph.pb
TensorFlow: v2.3.0-6-g23ad988fcde
Coqui STT: v0.10.0-alpha.3-50-g8a03f4bc
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2021-03-31 07:50:24.710417: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.236s.
Running inference.
2021-03-31 07:50:25.298287: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
2021-03-31 07:50:25.434853: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
2021-03-31 07:50:25.586481: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
2021-03-31 07:50:25.720652: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
2021-03-31 07:50:26.114561: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
aktienmyke
Inference took 2.859s for 1.496s audio file.
I expected tack så mycket
but the model returns aktienmyke
. Adding a language model (from Common Voice transcripts) doesn't help the situation:
(stt-venv) [josh@macbook deepspeech-swedish]$ stt --model model.pb --audio ../sample-audio/tack-sa-mycket.wav --scorer kenlm-common-voice.scorer
Loading model from file model.pb
TensorFlow: v2.3.0-6-g23ad988fcde
Coqui STT: v0.10.0-alpha.3-50-g8a03f4bc
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2021-03-31 10:01:51.924813: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.238s.
Loading scorer from files kenlm-common-voice.scorer
Loaded scorer in 0.000323s.
Running inference.
aktier
Inference took 3.357s for 1.496s audio file.
So, my guess is still that there's a missing symbol from the alphabet, but I don't know what it was or where it goes. Thoughts?
I'm afraid I won't be able to help you if you're using coqui. Try using our branch in this repo if you want to do further training.
I've double checked the alphabet and everything seems to be in order.
When you get it working, I suggest using the language model that we uploaded as part of the release.
thanks for the reply @hsson -- I'll close the issue for now, I'll keep you posted if I make any progress
hi, and thanks for open-sourcing your work!
I have a pretty strong suspicion that the
alphabet.txt
file in your1.0
release isn't the one you used to train the checkpoints... is that possible? It seems that the dimensions of the output layer of the model and the unique characters in the alphabet file are off-by-one.Was there some punctuation perhaps? do you have the original alphabet.txt file?
Thanks!