Open zackkui opened 6 years ago
I also ran into this problem.
The speech_recognition
problem sets inputs' feature_encoders to None
, which causes log_decode_results reture None
for decoded_inputs
, and thus, this line will break due to None
value for d_input
.
The simplest workaround is modify this line to
if d_input is not None and re.match("^({})+$".format(text_encoder.PAD), d_input):
continue
@WindQAQ What is your best WER with transformer? I got WER:18% on librispeech test-clean with 1000h trainning data.
Sorry but I have never run transformer on LibriSpeech. I ran into this problem because I use SpeechRecognitionModality
and SpeechRecognitionProblem
in my code.
@zackkui what's your training batch_size? I trained on 1080 with batch_size=2.3M3 and get WER almost your level. it seems that batch=8M8 would be better, but I don't have that kind of devices.
Description
I got a unexpected error when I was using t2t-decoder! It worked well before I updated the tensor2tensor code.
Environment information
For bugs: reproduction and error logs
Steps to reproduce:
Here is my decoder script:
!/bin/bash
DATA_DIR=/tensor2tensor/librispeech/data OUT_DIR=/tensor2tensor/librispeech/train mkdir -p $OUT_DIR
CHECK_POINT=
head -n 1 $OUT_DIR/checkpoint | tr -cd 0-9
DECODE_FILE_PREFIX="step-$CHECK_POINT"echo $DECODE_FILE_PREFIX MODEL=transformer HPRAMS=transformer_librispeech_v1 PROBLEM=librispeech_train_full_test_clean
./tensor2tensor/bin/t2t-decoder \ --data_dir=$DATA_DIR \ --problem=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPRAMS \ --output_dir=$OUT_DIR \ --decode_to_file=$DECODE_FILE_PREFIX \ --eval_use_test_set=True \
Error logs:
Traceback (most recent call last): File "./tensor2tensor/bin/t2t-decoder", line 17, in
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "./tensor2tensor/bin/t2t-decoder", line 12, in main
t2t_decoder.main(argv)
File "./tensor2tensor/bin/t2t_decoder.py", line 194, in main
decode(estimator, hp, decode_hp)
File "./tensor2tensor/bin/t2t_decoder.py", line 105, in decode
dataset_split="test" if FLAGS.eval_use_test_set else None)
File "./tensor2tensor/utils/decoding.py", line 196, in decode_from_dataset
checkpoint_path=checkpoint_path)
File "./tensor2tensor/utils/decoding.py", line 314, in decode_once
if re.match("^({})+$".format(text_encoder.PAD), d_input):
File "/usr/lib64/python2.7/re.py", line 137, in match
return _compile(pattern, flags).match(string)
TypeError: expected string or buffer