mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.33k stars 3.96k forks source link

Train DeepSpeech Model for Brazilian Portuguese #2627

Closed Edresson closed 4 years ago

Edresson commented 4 years ago

Hi,

I am trying to train the DeepSpeech model for Brazilian Portuguese, In Brazilian Portuguese there are few datasets available (here a work which used 14 hours of speech). In 2017 @reuben did some experiments with the DeepSpeech model using the LapsBM Dataset (a small dataset from Portuguese) apparently unsuccessful could we update on this @reuben?

I was able to get a 109 hour dataset in Brazilian Portuguese and I am trying to train DeepSpeech in this dataset (the dataset is spontaneous speaking and was collected from sociolinguistic interviews and was completely manually transcribed by humans)

For creating LM and trie I followed the documentation recommendations: I created words.arpa with the following command (RawText.txt contains all the transcripts (but the wav file paths have been removed from this file):

./lmplz --text ../../datasets/ASR-Portuguese-Corpus-V1/RawText.txt --arpa /tmp/words.arpa --order 5 --temp_prefix /tmp/

I generated lm.binary: kenlm/build/bin/build_binary -a 255 -q 8 trie lm.arpa lm.binary

I installed the native client: python util/taskcluster.py --arch gpu --target native_client --branch v0.6.0

I created the file alphabet.txt with the following:

`# Each line in this file represents the Unicode codepoint (UTF-8 encoded)# associated with a numeric label.# A line that starts with # is a comment. You can escape it with # if you wish # to use '#' as a label.

a b c d e f g h i j k l m n o p q r s t u v w x y z ç ã à á â ê é í ó ô õ ú û`

After I generated the trie: DeepSpeech/native_client/generate_trie ../datasets/ASR-Portuguese-Corpus-V1/alphabet.txt lm.binary trie

After I trained the model with the following command: python DeepSpeech.py \ --train_files ../../datasets/ASR-Portuguese-Corpus-V1/metadata_train.csv \ --checkpoint_dir ../deepspeech_v6-0-0/checkpoints/ \ --test_files ../../datasets/ASR-Portuguese-Corpus-V1/metadata_test_200.csv \ --alphabet_config_path ../../datasets/ASR-Portuguese-Corpus-V1/alphabet.txt \ --lm_binary_path ../../datasets/deepspeech-data/lm.binary \ --lm_trie_path ../../datasets/deepspeech-data/trie \ --train_batch_size 2 \ --test_batch_size 2 \ --dev_batch_size 2 \ --export_batch_size 2 \ --epochs 200 \ --early_stop False \

Previously I trained the model with early_stop (specifying dev_files), however the model stopped training after 4 epochs, so I removed the early stop. Both the 50 and 4 epochs models have the same results. I run the test using the following command:

python evaluate.py \
  --checkpoint_dir ../deepspeech_v6-0-0/checkpoints/ \
  --test_files ../../datasets/ASR-Portuguese-Corpus-V1/metadata_test_200.csv \
  --alphabet_config_path ../../datasets/ASR-Portuguese-Corpus-V1/alphabet.txt \
  --lm_binary_path  ../../datasets/deepspeech-data/lm.binary \
  --lm_trie_path ../../datasets/deepspeech-data/trie 

The result was:

INFO:tensorflow:Restoring parameters from ../deepspeech_v6-0-0/checkpoints/train-2796891
I0102 07:28:01.144442 139713243842368 saver.py:1280] Restoring parameters from ../deepspeech_v6-0-0/checkpoints/train-2796891
I Restored variables from most recent checkpoint at ../deepspeech_v6-0-0/checkpoints/train-2796891, step 2796891
Testing model on ../../datasets/ASR-Portuguese-Corpus-V1/metadata_test_200.csv
Test epoch | Steps: 0 | Elapsed Time: 0:00:00                                  2020-01-02 07:28:12.516808: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2020-01-02 07:28:12.713953: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-01-02 07:28:14.148053: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
Test epoch | Steps: 199 | Elapsed Time: 0:01:35                                
Test on ../../datasets/ASR-Portuguese-Corpus-V1/metadata_test_200.csv - WER: 0.956973, CER: 0.852231, loss: 101.685509
--------------------------------------------------------------------------------
WER: 4.000000, CER: 2.333333, loss: 54.671597
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/53999_nurc_.wav
 - src: "lá "
 - res: "e a e a "
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.666667, loss: 32.827530
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/17216_nurc_.wav
 - src: "revistas "
 - res: "e a "
--------------------------------------------------------------------------------
WER: 1.200000, CER: 0.739130, loss: 79.709518
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/60600_nurc_.wav
 - src: "num não me animo muito "
 - res: "e a a a a a "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.500000, loss: 8.319281
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/33267_sp_.wav
 - src: "é "
 - res: "e "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 11.219957
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/37622_sp_.wav
 - src: "né "
 - res: "e"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.500000, loss: 11.632010
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/29378_nurc_.wav
 - src: "é "
 - res: "e "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.500000, loss: 12.242241
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/37172_nurc_.wav
 - src: "é "
 - res: "e "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 13.220651
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/62827_sp_.wav
 - src: "não "
 - res: "e"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.750000, loss: 14.941595
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/844_nurc_.wav
 - src: "mas "
 - res: "e "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.750000, loss: 14.989404
 - wav: file:///media/edresson/5bef138d-5bcc-41af-a3f0-67c9bd0032c4/edresson/DD/datasets/ASR-Portuguese-Corpus-V1/data/22739_sp_.wav
 - src: "uhn "
 - res: "e "
--------------------------------------------------------------------------------

The model often transcribes the letter "e", the use of this letter is very frequent in the dataset.

Am I doing something wrong?

How can I check if my lm.binarry and trie are correct?

Does anyone have any suggestions?

Best Regards,

lissyx commented 4 years ago

This is not a DeepSpeech bug, please use discourse for support-related questions.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.