vrenkens / nabu

Code for end-to-end ASR with neural networks, build with TensorFlow
MIT License
108 stars 43 forks source link

issue with training (LAS TIMIT) #32

Closed rzcwade closed 6 years ago

rzcwade commented 6 years ago

Hi,

As I was running the training script:

run train --expdir=/home/zichengr/nabu/expdir --recipe=/home/zichengr/nabu/config/recipes/LAS/TIMIT --mode=non_distributed --computing=standard

and I ran into this issue:

2018-06-23 06:27:09.878995: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-06-23 06:27:09.886528: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job local -> {0 -> localhost:32884} 2018-06-23 06:27:09.896088: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:332] Started server with target: grpc://localhost:32884 starting training 2018-06-23 06:27:18.753425: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session 561ab41479266096 with config: gpu_options { allow_growth: true } allow_soft_placement: true WORKER 0: validating model Traceback (most recent call last): File "nabu/scripts/prepare_train.py", line 365, in main(FLAGS.expdir, FLAGS.recipe, FLAGS.mode, FLAGS.computing) File "nabu/scripts/prepare_train.py", line 90, in main expdir=expdir) File "/home/zichengr/nabu/nabu/scripts/train.py", line 85, in train tr.train(testing) File "/home/zichengr/nabu/nabu/neuralnetworks/trainers/trainer.py", line 776, in train outputs['increment_step'].run(session=sess) File "/home/zichengr/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 679, in exit self._close_internal(exception_type) File "/home/zichengr/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 716, in _close_internal self._sess.close() File "/home/zichengr/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 964, in close self._sess.close() File "/home/zichengr/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1108, in close ignore_live_threads=True) File "/home/zichengr/.local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/home/zichengr/.local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run enqueue_callable() File "/home/zichengr/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1244, in _single_operation_run self._call_tf_sessionrun(None, {}, [], target_list, None) File "/home/zichengr/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [sil m ow s cl p r iy s ih ng cl k s hh eh dx ax th er dx ix v dh ax v ow cl t s cl k aw n ix vcl sil] [[Node: train/get_batch/input_pipeline/read_data/reader_1/StringReader/assert_equal/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:local/replica:0/task:0/device:CPU:0"](train/get_batch/input_pipeline/read_data/reader_1/StringReader/assert_equal/Equal, train/get_batch/input_pipeline/read_data/reader_1/StringReader/ParseSingleExample/ParseSingleExample)]]

I am not sure how to fix the issue. I hope anyone can help me here.

Thanks.

vrenkens commented 6 years ago

This error is normally caused if there is an unknown character in the sequence. Is there a symbol in "sil m ow s cl p r iy s ih ng cl k s hh eh dx ax th er dx ix v dh ax v ow cl t s cl k aw n ix vcl sil" that is not defined in the alphabet?

rzcwade commented 6 years ago

Hi,

Thank you for your response. I tried including all the unknown characters appeared in the error message in test_evaluator.cfg and validation_evaluator.cfg from recipe/LAS/TIMIT, and yet it still throws me this error. Was I in the wrong location adding the symbols to the alphabet? What is the correct way to define symbols in the alphabet?

Thanks!

vrenkens commented 6 years ago

Hi,

You should add it in the text_processor: config/recipes/LAS/TIMIT/text_processor.cfg.

The alphabet is defined in multiple config files. I know this is a bit confusing and it's not very well structured. Sorry about that!

rzcwade commented 6 years ago

Hi @vrenkens ,

Thanks for your response. I added the symbols in text_processor.cfg and re-ran the data script, but I still ran into the same error when running the prepare_train script.

Thanks.

vrenkens commented 6 years ago

Hi,

Could you perhaps give me a zip of your recipe? I will check if I find anything.

Cheers

rzcwade commented 6 years ago

Here's my recipe: TIMIT.zip Thanks!

vrenkens commented 6 years ago

Hi,

The issue is still in the alphabet. TIMIT has 3 phonetic alphabets, one with 61 phonemes, and 2 reduced alphabets with 39 and 48 phonemes where multiple similar phonemes of the 61 set are mapped to a single phoneme.

I see you are using the 61 set but you have only 60 symbols defined in your alphabet which is a mix of the 39 set and the 61 set symbols. So there are several symbols missing, which causes the error.

In our recipe we use the 39 set, so the alphabet contains only those phonemes. I recommend you switch to that set. If you want to switch to the 61 set you should create your own alphabet containing all 61 phonemes :).

I will include the file containing the mapping between the phonemes in the 61 set to the 48 and 39 sets:

aa  aa  aa
ae  ae  ae
ah  ah  ah
ao  ao  aa
aw  aw  aw
ax  ax  ah
ax-h    ax  ah
axr er  er
ay  ay  ay
b   b   b
bcl vcl sil
ch  ch  ch
d   d   d
dcl vcl sil
dh  dh  dh
dx  dx  dx
eh  eh  eh
el  el  l
em  m   m
en  en  n
eng ng  ng
epi epi sil
er  er  er
ey  ey  ey
f   f   f
g   g   g
gcl vcl sil
h#  sil sil
hh  hh  hh
hv  hh  hh
ih  ih  ih
ix  ix  ih
iy  iy  iy
jh  jh  jh
k   k   k
kcl cl  sil
l   l   l
m   m   m
n   n   n
ng  ng  ng
nx  n   n
ow  ow  ow
oy  oy  oy
p   p   p
pau sil sil
pcl cl  sil
q
r   r   r
s   s   s
sh  sh  sh
t   t   t
tcl cl  sil
th  th  th
uh  uh  uh
uw  uw  uw
ux  uw  uw
v   v   v
w   w   w
y   y   y
z   z   z
zh  zh  sh

Cheers

rzcwade commented 6 years ago

Hi Vincent,

Thank you for putting so much effort in helping me resolving the issue. I still run into the same assertion failed error after I switched to the 39 phoneme as you recommended. Is it a TIMIT version issue? What is your version of TIMIT set?

Thanks.

vrenkens commented 6 years ago

If you get the same assertion error you have not switched to the 39 set, in the error there is a vcl symbol, which is not in the 39 set as you can see in the phoneme mapping

rzcwade commented 6 years ago

Hi Vincent,

You are right. I still have vcl symbol in the error. Is this related to the Kaldi timit data preparation? Because I still see vcl and others from the 61 set in the kaldi/egs/timit/s5/data/lang/phone.txt. I was simply following the kaldi script to process the raw timit data. How would you actually switch to the 39 set?

Thank you so much for your patience :)

vrenkens commented 6 years ago

It's been a very long time since I did the timit data prep, so I don't remember :s. You can probably find something in the kaldi dataprep file or on the forums.

rzcwade commented 6 years ago

Thanks. I believe the data prep was the issue.