noahchalifour / rnnt-speech-recognition

End-to-end speech recognition using RNN Transducers in Tensorflow 2.0
MIT License
241 stars 78 forks source link

EPOCH RESULTS: Loss: 0.0000 #49

Open weiwenying opened 3 years ago

weiwenying commented 3 years ago

When:

python run_rnnt.py \
    --mode train \
    --data_dir <path to data directory>

Print:

EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
EPOCH RESULTS: Loss: 0.0000
Performing evaluation.
VALIDATION RESULTS: Time: 0.0656, Loss: 0.0000, Accuracy: 0.0000, WER: 0.0000
Saving checkpoint ./model/checkpoint_0_0.0000.hdf5

ubuntu18.04LTS, and conda list:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
_tflow_select             2.1.0                       gpu    defaults
absl-py                   0.10.0                   pypi_0    pypi
aiohttp                   3.6.3            py37h7b6447c_0    defaults
appdirs                   1.4.4                    pypi_0    pypi
astunparse                1.6.3                      py_0    defaults
async-timeout             3.0.1            py37h06a4308_0    defaults
attrs                     20.3.0             pyhd3eb1b0_0    defaults
audioread                 2.1.9                    pypi_0    pypi
blas                      1.0                         mkl    defaults
blinker                   1.4              py37h06a4308_0    defaults
brotlipy                  0.7.0           py37h27cfd23_1003    defaults
c-ares                    1.17.1               h27cfd23_0    defaults
ca-certificates           2021.1.19            h06a4308_0    defaults
cachetools                4.2.1              pyhd3eb1b0_0    defaults
certifi                   2020.12.5        py37h06a4308_0    defaults
cffi                      1.14.4           py37h261ae71_0    defaults
chardet                   4.0.0                    pypi_0    pypi
click                     7.1.2              pyhd3eb1b0_0    defaults
cryptography              3.3.1            py37h3c74f83_0    defaults
cudatoolkit               10.1.243             h6bb024c_0    defaults
cudnn                     7.6.5                cuda10.1_0    defaults
cupti                     10.1.168                      0    defaults
cycler                    0.10.0                   pypi_0    pypi
decorator                 4.4.2                    pypi_0    pypi
dill                      0.3.3                    pypi_0    pypi
future                    0.18.2                   pypi_0    pypi
gast                      0.3.3                      py_0    defaults
google-auth               1.24.0             pyhd3eb1b0_0    defaults
google-auth-oauthlib      0.4.2              pyhd3eb1b0_2    defaults
google-pasta              0.2.0                      py_0    defaults
googleapis-common-protos  1.52.0                   pypi_0    pypi
grpcio                    1.35.0                   pypi_0    pypi
h5py                      2.10.0           py37hd6299e0_1    defaults
hdf5                      1.10.6               hb1b8bf9_0    defaults
idna                      2.10               pyhd3eb1b0_0    defaults
importlib-metadata        3.4.0                    pypi_0    pypi
importlib-resources       5.1.0                    pypi_0    pypi
intel-openmp              2020.2                      254    defaults
joblib                    1.0.0                    pypi_0    pypi
keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults
kiwisolver                1.3.1                    pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7    defaults
libedit                   3.1.20191231         h14c3975_1    defaults
libffi                    3.3                  he6710b0_2    defaults
libgcc-ng                 9.1.0                hdf63c60_0    defaults
libgfortran-ng            7.3.0                hdf63c60_0    defaults
libprotobuf               3.14.0               h8c45485_0    defaults
librosa                   0.8.0                    pypi_0    pypi
libstdcxx-ng              9.1.0                hdf63c60_0    defaults
llvmlite                  0.35.0                   pypi_0    pypi
markdown                  3.3.3            py37h06a4308_0    defaults
matplotlib                3.3.4                    pypi_0    pypi
mkl                       2020.2                      256    defaults
mkl-service               2.3.0            py37he8ac12f_0    defaults
mkl_fft                   1.2.0            py37h23d657b_0    defaults
mkl_random                1.1.1            py37h0573a6f_0    defaults
multidict                 4.7.6            py37h7b6447c_1    defaults
ncurses                   6.2                  he6710b0_1    defaults
numba                     0.52.0                   pypi_0    pypi
numpy                     1.20.0                   pypi_0    pypi
numpy-base                1.19.2           py37hfa32c7d_0    defaults
oauthlib                  3.1.0                      py_0    defaults
openssl                   1.1.1i               h27cfd23_0    defaults
opt-einsum                3.3.0                    pypi_0    pypi
opt_einsum                3.1.0                      py_0    defaults
packaging                 20.9                     pypi_0    pypi
pillow                    8.1.0                    pypi_0    pypi
pip                       20.3.3           py37h06a4308_0    defaults
pooch                     1.3.0                    pypi_0    pypi
promise                   2.3                      pypi_0    pypi
protobuf                  3.14.0                   pypi_0    pypi
pyasn1                    0.4.8                      py_0    defaults
pyasn1-modules            0.2.8                    pypi_0    pypi
pycparser                 2.20                       py_2    defaults
pydub                     0.24.1                   pypi_0    pypi
pyjwt                     1.7.1                    py37_0    defaults
pyopenssl                 20.0.1             pyhd3eb1b0_1    defaults
pyparsing                 2.4.7                    pypi_0    pypi
pysocks                   1.7.1                    py37_1    defaults
python                    3.7.9                h7579374_0    defaults
python-dateutil           2.8.1                    pypi_0    pypi
readline                  8.1                  h27cfd23_0    defaults
requests                  2.25.1             pyhd3eb1b0_0    defaults
requests-oauthlib         1.3.0                      py_0    defaults
resampy                   0.2.2                    pypi_0    pypi
rsa                       4.7                pyhd3eb1b0_1    defaults
scikit-learn              0.24.1                   pypi_0    pypi
scipy                     1.4.1                    pypi_0    pypi
setuptools                52.0.0           py37h06a4308_0    defaults
six                       1.15.0           py37h06a4308_0    defaults
soundfile                 0.10.3.post1             pypi_0    pypi
sqlite                    3.33.0               h62c20be_0    defaults
tensorboard               2.2.2                    pypi_0    pypi
tensorboard-plugin-wit    1.8.0                    pypi_0    pypi
tensorflow                2.2.0           gpu_py37h1a511ff_0    defaults
tensorflow-base           2.2.0           gpu_py37h8a81be8_0    defaults
tensorflow-datasets       4.2.0                    pypi_0    pypi
tensorflow-estimator      2.2.0              pyh208ff02_0    defaults
tensorflow-gpu            2.2.0                h0d30ee6_0    defaults
tensorflow-metadata       0.27.0                   pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
threadpoolctl             2.1.0                    pypi_0    pypi
tk                        8.6.10               hbc83047_0    defaults
tqdm                      4.56.0                   pypi_0    pypi
typing-extensions         3.7.4.3                  pypi_0    pypi
urllib3                   1.26.3             pyhd3eb1b0_0    defaults
warprnnt-tensorflow       0.1                      pypi_0    pypi
werkzeug                  1.0.1              pyhd3eb1b0_0    defaults
wheel                     0.36.2             pyhd3eb1b0_0    defaults
wrapt                     1.12.1           py37h7b6447c_1    defaults
xz                        5.2.5                h7b6447c_0    defaults
yarl                      1.6.3            py37h27cfd23_0    defaults
zipp                      3.4.0              pyhd3eb1b0_0    defaults
zlib                      1.2.11               h7b6447c_3    defaults

EPOCH RESULTS: Loss: 0.0000, It's seem not working?

weiwenying commented 3 years ago

Now, I've solved the problem. This is caused when the dataset is empty. when you run:

./scripts/common_voice_convert.sh <data_dir> <# of threads>

You want to convert MP3 to WAV format:

# convert before
290a72db5e6654c2fcfcf3ff37c455264d4d598dadb1a5bfeb7c268f075894fff7cf31dafec97af3720ff178f0.mp3

# convert after
290a72db5e6654c2fcfcf3ff37c455264d4d598dadb1a5bfeb7c268f075894fff7cf31dafec97af3720ff1.wav

but ./scripts/common_voice_convert.sh actually:

# convert before
290a72db5e6654c2fcfcf3ff37c455264d4d598dadb1a5bfeb7c268f075894fff7cf31dafec97af3720ff178f0.mp3

# convert after
290a72db5e6654c2fcfcf3ff37c455264d4d598dadb1a5bfeb7c268f075894fff7cf31dafec97af3720ff178f0.wav

This is not what you want. So, after convert, and then rename the wav files, using Python script:

import pathlib

# your datasets clips path
src_dir = "/home/weiwenying/projects/Celex/rnnt-speech-recognition/user_opt/zh-TW/clips"

for path in pathlib.Path(src_dir).glob("*.wav"):
    new_stem = str(path.stem)[:-4]
    new_name = new_stem + ".wav"
    new_path = path.with_name(new_name)
    path.rename(new_path)

Now, train.tfrecord is not empty:


total 328M
drwxrwxr-x 2 weiwenying weiwenying 4.0K 2月   4 10:33 .
drwxrwxr-x 5 weiwenying weiwenying 4.0K 2月   4 10:32 ..
-rw-rw-r-- 1 weiwenying weiwenying 105M 2月   4 11:47 dev.tfrecord
-rw-rw-r-- 1 weiwenying weiwenying  33K 2月   4 10:33 encoder.subwords
-rw-rw-r-- 1 weiwenying weiwenying 118M 2月   4 11:47 test.tfrecord
-rw-rw-r-- 1 weiwenying weiwenying 106M 2月   4 11:47 train.tfrecord

and then, run:

python run_rnnt.py \
    --mode train \
    --data_dir <path to data directory>

Normal work is as follows:

Epoch: 4, Batch: 34, Global Step: 190, Step Time: 1.0850, Loss: 8.8683
Epoch: 4, Batch: 35, Global Step: 191, Step Time: 1.0003, Loss: 8.8519
Epoch: 4, Batch: 36, Global Step: 192, Step Time: 0.9517, Loss: 8.8606
Epoch: 4, Batch: 37, Global Step: 193, Step Time: 0.7079, Loss: 8.8660
Epoch: 4, Batch: 38, Global Step: 194, Step Time: 0.8511, Loss: 8.8473
EPOCH RESULTS: Loss: 8.8473

That all!