tensorflow / lingvo

Lingvo
Apache License 2.0
2.81k stars 443 forks source link

Did i do something wrong? #60

Closed minkyu119 closed 5 years ago

minkyu119 commented 5 years ago

Hi. I am attempting to reproduce the ASR librispeech task using Lingvo

My hardwares consist of 16GPU( Cluster x 4 GPU-1080Ti), and i share my storage as NFS.

I changed batch size 96,48 -> 32 (because of OOM)

And i tried to train librispecch 960 Grapheme baseline for 5 days... (And now i turn off varitional noise now..)

I read your report which need about 11 days for training, but it's gonna be not working on my case...

image

image

image

About 5 days it still at under 40k step.... and WER also stay at about 11%

is it normal speed for my cluster or do i have some problem with network or something...

thanks for your insight.

drpngx commented 5 years ago

CC @rprabhavalkar

Unfortunately we don't run this on GPUs, so I don't have guidance here. We're planning on releasing the numbers on a smaller recipe that would train in reasonable time, but for now, you are more or less on your own. The hardware you're using seems right. It looks like the variational noise perturbed the models too much, I would try reducing p.train.vn_std. Also, the grapheme is probably slower than the WPM.

manish-kumar-garg commented 4 years ago

@minkyu119 can you tell me how did you decode the results and calculate WER? I am unable to find the scripts for inference on test and dev set.