Open zh794390558 opened 6 years ago
Your loss is too high(loss = 1.373911). According to my experience, you won't get any acceptable predict with loss higher than 1. I never tried transformer_librispeech_tpu hparam. If the loss is still droping, you can try training more steps. BUT, after 120k steps with so high loss, I wouldn't expect any good result. Maybe the key reason is that, I suppose you were using one single local GPU, why you are using tpu hparams? Try to use transformer_librispeech_v1. After only 20k~30k steps you can get a good result.
@Qiaoxl Thanks for your help. How number GPUs do you using, does this is a key point?
@zh794390558 It doesn't matter very much how many GPUs you are using. But with more GPUs your can use larger batch_size, which will help ( See Training Tips for the Transformer Model).
How can you get librispeech_train_full_test_clean data with t2t-datagen?
How can you get librispeech_train_full_test_clean data with t2t-datagen?
Don't quite understand your question. with --problem=librispeech_train_full_test_clean on t2t-datagen, it will first download data if they were not found. If the program has problem with downloading. You can pre-download yourself. See detail in https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/librispeech.py#L26-L59
@Qiaoxl Thanks for replying and I know how. I rewrite the librispeech.py whith a new problem name. But I forgot to change the file suffix ,e.g "librispeech".
shard_str = "-%05d" % shard if shard is not None else ""
if mode == problem.DatasetSplit.TRAIN:
path = os.path.join(data_dir, "librispeech")
suffix = "train"
elif mode in [problem.DatasetSplit.EVAL, tf.estimator.ModeKeys.PREDICT]:
path = os.path.join(data_dir, "librispeech_clean")
suffix = "dev"
else:
assert mode == problem.DatasetSplit.TEST
path = os.path.join(data_dir, "librispeech_clean")
suffix = "test"
return "%s-%s%s*" % (path, suffix, shard_str)
@zackkui
If you need to add a new librispeech problem:
Now you get a new librispeech problem.
If you added new datasets in _LIBRISPEECH_TRAIN_DATASETS, _LIBRISPEECH_DEV_DATASETS or _LIBRISPEECH_TEST_DATASETS, you also need to change the slices of origin problems(LibrispeechTrainFullTestClean, LibrispeechCleanSmall, LibrispeechClean, LibrispeechNoisy) to make sure their datasets unchanged.
It's a good way. Thanks a lot again!
Have you resolved this issue? I implemented ASR transformer myself based on the official transformer code and I'm facing the same issue as you. During training, it outputs quite appropriate logits but in test it spit out just wrong sentence, the same sentence for every different audios ... I have no idea why this happens at all.
Any progress on this issue?
No progress on this. I do not have time and machine to test this.
Any progress on this issue?
PROBLEM=librispeech_train_full_test_clean
MODEL=transformer
HPARAMS_SET=transformer_librispeech_v1
And the training loss:
I didn't test the WER, because I just need the trained model for transfer learning. But the result should be good.
This issue ought to be closed.
@Qiaoxl , I do not know how you get the problem start to train, there are many weired settings:
After fixing the problem, I could make data: PYTHONPATH=. ./tensor2tensor/bin/t2t-datagen --data_dir=data/librispeech/ --tmp_dir=data/librispeech/ --problem=librispeech_train_full_test_clean
And then I start the problem training: CUDA_VISIBLE_DEVICES=4,5,6,7 PYTHONPATH=. nohup ./tensor2tensor/bin/t2t-trainer --model=transformer --hparams_set=transformer_librispeech_v2 --problem=librispeech_train_full_test_clean --train_steps=120000 --local_eval_frequency=5000 --eval_steps=50 --data_dir data/librispeech/ --output_dir=./librispeech_output --worker_gpu=4 the training process converges very slow.
I have the exact same problem. No matter what training set I use (100h or 960h) or what batch_size or what optimizer setup, I never get those nice "2 step" loss curves for librispeech. The "knee" around 20-30k steps doesn't occur, all plots show just a straight exponential decay.
Needless to say that inference is completely useless at the final checkpoint barely below 1.0, outputting just the same character or nothing in my case.
=== My setup ===
elementary OS (Ubuntu 18.04)
NVIDIA driver 440.64.00 (GTX1080 8GB)
CUDA 10.0 (10.0.130-1)
CUDNN 7.6.5.32-1+cuda10.0
$ pip3 freeze|grep tensor
mesh-tensorflow==0.0.5
tensor2tensor==1.13.4
tensorboard==1.14.0
tensorflow-datasets==1.0.2
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0
tensorflow-metadata==0.14.0
tensorflow-probability==0.7.0
=== My train.sh ===
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:$LD_LIBRARY_PATH
export TF_FORCE_GPU_ALLOW_GROWTH=true
~/.local/bin/t2t-trainer \
--generate_data \
--problem=librispeech_clean_small \
--model=transformer \
--hparams_set=transformer_librispeech \
--hparams="batch_size=2100000" \
--train_steps=500000 \
--eval_steps=3 \
--local_eval_frequency=100 \
--worker_gpu=1 \
--data_dir=./data \
--output_dir=./output-train \
--tmp_dir=./tmp
I used TF_FORCE_GPU_ALLOW_GROWTH to avoid sudden OOM issues as I use the same machine occasionally for light-weight desktop tasks. I also tried different hparams_sets (transformer_librispeech_v1), but even that didn't change the trend.
@Qiaoxl: I think it would be useful to share the environment that you used to generate your plots.
Any indicative help would be useful. I've also found ticket #1245 mentioning the same issue, but only found a convergence plot there that resulted in closing the ticket. I assumed the setup would be a no-brainer, but even setting up CUDA correctly was a major hazzle and now those training scripts also don't go in the expected direction. I would like to understand what exactly I'm doing wrong here.
By "accident" I left a training running over a couple of days and I finally got some convergence, but the loss only started to drop after some 140k steps. That's by no means in the range of reported 20-30k steps here. And I even used a smaller batch_size than the default setting of 6M. And with larger batch sizes I would expect only slower, but less noisy convergence. The loss also never reached those low values we see in Qiaoxl's plots, even after some 300k steps.
I re-ran the same training script (from scratch) and this time it didn't even converge to lower than 1.0 even after 300k steps :-(. Very stochastic outcome and quite annoying to waste that much energy on it. Is there a trick that I'm missing? Is it my setup? I'm using a single GTX1080 (non-Ti) with 8GB, is it due to limited hardware?
Description
I have used the T2T to test performance of librispeech before, and I remember the wer on test-clean data is almost 7%, but now the WER is so poor. Also the decoding result be same thing. Anybody can give some advice?
Environment information
I used this option for test, but the result also poor.
Training output:
Decoding output:
Dataset