se-asr / model

DeepSpeech model for Swedish ASR, free of use
4 stars 1 forks source link

incorrect alphabet.txt in release #2

Closed JRMeyer closed 3 years ago

JRMeyer commented 3 years ago

hi, and thanks for open-sourcing your work!

I have a pretty strong suspicion that the alphabet.txt file in your 1.0 release isn't the one you used to train the checkpoints... is that possible? It seems that the dimensions of the output layer of the model and the unique characters in the alphabet file are off-by-one.

Was there some punctuation perhaps? do you have the original alphabet.txt file?

Thanks!

hsson commented 3 years ago

Hi, thanks for showing interest! The alphabet in the release should be correct, but of course there might have been an error. It is supposed to contain the characters a-z as well as the Swedish characters å ä ö, and a whitespace.

What are two different numbers you're seeing? Just as a sanity check 😊

JRMeyer commented 3 years ago

Hi @hsson, thanks for the reply!

I'm trying to export the model checkpoints from this repo into a new output_graph.pb using the newest training Docker image from coqui-ai STT (a continuation of mozilla's deepspeech project). The original checkpoints are from mozilla's v0.6.1 and the new Coqui Docker image is on main.

Steps to reproduce:

  1. download and untar model-checkpoint.tar.gz and alphabet.txt from [se-asr/model/releases]
  2. pull and run the training Docker image from Coqui [here]
  3. mount local machine with Docker container
  4. within Docker container, attempt to export checkpoints as graph via: /code/train.py --checkpoint_dir checkpoints/ --export_dir checkpoints/ --alphabet_config_path alphabet.txt

After downloading data and setting up the Docker container, here's what happens:

root@3678f66a06ff:/mnt/deepspeech-swedish# wc -l alphabet.txt
36 alphabet.txt
root@3678f66a06ff:/mnt/deepspeech-swedish# ls checkpoints/
author_model_0.0.1.md                best_dev-286629.index  checkpoint
best_dev-286629.data-00000-of-00001  best_dev-286629.meta
root@3678f66a06ff:/mnt/deepspeech-swedish# /code/train.py --checkpoint_dir checkpoints/ --export_dir checkpoints/ --alphabet_config_path alphabet.txt
I Exporting the model...
I Could not find best validating checkpoint.
I Loading most recent checkpoint from checkpoints/best_dev-286629
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
Traceback (most recent call last):
  File "/code/train.py", line 12, in <module>
    ds_train.run_script()
  File "/code/training/coqui_stt_training/train.py", line 986, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/code/training/coqui_stt_training/train.py", line 966, in main
    export()
  File "/code/training/coqui_stt_training/train.py", line 815, in export
    load_graph_for_evaluation(session)
  File "/code/training/coqui_stt_training/util/checkpoints.py", line 151, in load_graph_for_evaluation
    _load_or_init_impl(session, methods, allow_drop_layers=False)
  File "/code/training/coqui_stt_training/util/checkpoints.py", line 106, in _load_or_init_impl
    return _load_checkpoint(session, ckpt_path, allow_drop_layers, allow_lr_init=allow_lr_init)
  File "/code/training/coqui_stt_training/util/checkpoints.py", line 71, in _load_checkpoint
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
    session.run(self.initializer, {self.initializer.inputs[1]: value})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(31,)'

You see there's an off-by-one error:

ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(31,)'

and the alphabet file only has 31 unique symbols:

root@3678f66a06ff:/mnt/deepspeech-swedish# cat alphabet.txt | grep -v "#" | wc -l
31

Given that the official [Mozilla alphabet] contains an apostrophe ', I thought the original Swedish alphabet.txt may have included it as well, so I included the apostrophe, and the checkpoints can be successfully exported:

root@3678f66a06ff:/mnt/deepspeech-swedish# python /code/train.py --checkpoint_dir checkpoints/ --export_dir checkpoints/ --alphabet_config_path alphabet.txt-new
I Exporting the model...
I Could not find best validating checkpoint.
I Loading most recent checkpoint from checkpoints/best_dev-286629
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
I Models exported at checkpoints/
I Model metadata file saved to checkpoints/author_model_0.0.1.md. Before submitting the exported model for publishing make sure all information in the metadata file is correct, and complete the URL fields.

The problem is, I'm not sure if there actually was a ' in the original alphabet, and even if there was, I don't know where it goes in the file.

When I try out the exported model on some clear, basic Swedish, the model doesn't perform well:

(stt-venv) [josh@macbook stt-community-models]$ stt --model deepspeech-swedish/checkpoints/output_graph.pb --audio sample-audio/tack-sa-mycket.wav 
Loading model from file deepspeech-swedish/checkpoints/output_graph.pb
TensorFlow: v2.3.0-6-g23ad988fcde
 Coqui STT: v0.10.0-alpha.3-50-g8a03f4bc
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2021-03-31 07:50:24.710417: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.236s.
Running inference.
2021-03-31 07:50:25.298287: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
2021-03-31 07:50:25.434853: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
2021-03-31 07:50:25.586481: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
2021-03-31 07:50:25.720652: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
2021-03-31 07:50:26.114561: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 134217728 exceeds 10% of free system memory.
aktienmyke
Inference took 2.859s for 1.496s audio file.

I expected tack så mycket but the model returns aktienmyke. Adding a language model (from Common Voice transcripts) doesn't help the situation:

(stt-venv) [josh@macbook deepspeech-swedish]$ stt --model model.pb --audio ../sample-audio/tack-sa-mycket.wav --scorer kenlm-common-voice.scorer 
Loading model from file model.pb
TensorFlow: v2.3.0-6-g23ad988fcde
 Coqui STT: v0.10.0-alpha.3-50-g8a03f4bc
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2021-03-31 10:01:51.924813: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.238s.
Loading scorer from files kenlm-common-voice.scorer
Loaded scorer in 0.000323s.
Running inference.
aktier
Inference took 3.357s for 1.496s audio file.

So, my guess is still that there's a missing symbol from the alphabet, but I don't know what it was or where it goes. Thoughts?

hsson commented 3 years ago

I'm afraid I won't be able to help you if you're using coqui. Try using our branch in this repo if you want to do further training.

I've double checked the alphabet and everything seems to be in order.

When you get it working, I suggest using the language model that we uploaded as part of the release.

JRMeyer commented 3 years ago

thanks for the reply @hsson -- I'll close the issue for now, I'll keep you posted if I make any progress