srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
825 stars 343 forks source link

Timit fbank result is ok? and how to add some features such as delta-delta? #59

Open zhangjiulong opened 8 years ago

zhangjiulong commented 8 years ago

Hi I tested timit data using eesen, but the result is not good as follows:

training process

EPOCH 11 RUNNING ... ENDS [2016-Jun-6 17:02:47]: lrate 4e-05, TRAIN ACCURACY 23.4300%, VALID ACCURACY 17.3147%
EPOCH 12 RUNNING ... ENDS [2016-Jun-6 17:07:02]: lrate 4e-05, TRAIN ACCURACY 25.2924%, VALID ACCURACY 16.1223%
EPOCH 13 RUNNING ... ENDS [2016-Jun-6 17:11:18]: lrate 4e-05, TRAIN ACCURACY 26.1150%, VALID ACCURACY 18.4033%
EPOCH 14 RUNNING ... ENDS [2016-Jun-6 17:15:33]: lrate 4e-05, TRAIN ACCURACY 26.6806%, VALID ACCURACY 19.5179%
EPOCH 15 RUNNING ... ENDS [2016-Jun-6 17:19:51]: lrate 4e-05, TRAIN ACCURACY 27.1350%, VALID ACCURACY 18.6625%
EPOCH 16 RUNNING ... ENDS [2016-Jun-6 17:24:07]: lrate 2e-05, TRAIN ACCURACY 27.4092%, VALID ACCURACY 20.1400%
EPOCH 17 RUNNING ... ENDS [2016-Jun-6 17:28:23]: lrate 1e-05, TRAIN ACCURACY 27.5363%, VALID ACCURACY 20.2177%
finished, too small rel. improvement .0777
Training succeeded. The final model exp/train_phn_l5_c320/final.nnet
Removing features tmpdir exp/train_phn_l5_c320/ptrXL @ pingan-nlp-001
cv.ark  train.ark

testing process

rjb1_sx64-0000000-0000248 out-moded 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjb1_sx64-0000000-0000248 is 0.454562 over 246 frames.
mrjh0_sa1-0000000-0000385 she had your dark suit in greasy wash water all 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa1-0000000-0000385 is 0.577131 over 383 frames.
mrjh0_sa2-0000000-0000317 how ask me to carry an oily rag like 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa2-0000000-0000317 is 0.483511 over 315 frames.
mrjh0_si1145-0000000-0000487 how unauthentic 
LOG (latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:294) Rebuilding repository.
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1145-0000000-0000487 is 0.258022 over 485 frames.
mrjh0_si1775-0000000-0000306 how unauthentic 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1775-0000000-0000306 is 0.384129 over 304 frames.
mrjh0_si515-0000000-0000296 out-moded 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si515-0000000-0000296 is 0.429838 over 294 frames.
mrjh0_sx155-0000000-0000394 how unauthentic

I checked the ark file of timit and tedlium data and I found some difference, but I do not know how the difference come. The tedlium's ark file like this :

AlGore_2009  [
  510340.6 586395.1 608272.1 621239.9 642546.4 653072.2 651401.9 651305.8 653922.6 659371.4 654681.1 652654.5 646230.6 645681.9 650887.6 655483.5 666377.6 671666.1 672115.6 669366.7 669373.2 681050.7 703447.4 715073.2 709013.8 702928.3 713154.4 718430.6 711170 688705.3 658752.9 641324.2 630078.5 628411.7 623944.6 627934.9 639849.6 641777.4 643522.4 627100.5 39020
  6946354 9087419 9763794 1.018412e+07 1.091917e+07 1.127568e+07 1.123372e+07 1.124698e+07 1.134869e+07 1.154412e+07 1.137156e+07 1.1279e+07 1.104712e+07 1.103819e+07 1.121266e+07 1.137482e+07 1.174376e+07 1.193318e+07 1.195034e+07 1.184173e+07 1.18378e+07 1.224044e+07 1.303755e+07 1.345864e+07 1.321482e+07 1.298168e+07 1.337751e+07 1.358918e+07 1.332344e+07 1.25133e+07 1.146994e+07 1.091216e+07 1.056395e+07 1.05361e+07 1.041939e+07 1.053006e+07 1.088328e+07 1.093435e+07 1.097803e+07 1.044419e+07 0 ]

And the timit's is like this:

fadg0_sa1  [
  3077.437 3576.837 3893.808 4497.17 4646.433 4888.595 5084.933 5245.375 5266.312 5316.513 5304.906 5279.905 5159.947 5092.513 5093.656 5096.891 5198.106 5342.096 5525.816 5622.102 5590.077 5587.714 5621.955 5658.111 5640.733 5684.978 5922.412 6028.531 5843.909 5494.285 5123.665 4873.254 4768.456 4619.075 4454.212 4446.68 4533.783 4809.863 5073.438 5097.519 372
  28369.65 38061.98 44509.9 59787.96 63547.87 70383.9 75846.95 80695.33 82071.57 83632.43 82730.72 81498.48 78174.86 76341.12 75977.55 75682.39 78059.57 82118.61 87383.28 90191.3 89340.34 89230.34 90614.35 91722.68 90768.06 91814.1 99787.2 103876.6 97762.07 85880.71 74550.43 67565.18 64682.43 60528.35 56254.4 56227.67 58352.52 65413.38 72421.16 72856.04 0 ]

but the scripts is the same as the tedliums', (I just modified exist code),the diff runs like this:

91c91
<      || exit 208;

---
>      || exit 1;
106c106
<      || exit 209;

---
>      || exit 1;

and the scripts is like this:

#!/bin/bash 

# Copyright 2012  Karel Vesely  Johns Hopkins University (Author: Daniel Povey)
# Apache 2.0
# To be run from .. (one directory up from here)
# see ../run.sh for example

# Begin configuration section.
nj=4
cmd=run.pl
fbank_config=conf/fbank.conf
compress=true
# End configuration section.

echo "$0 $@"  # Print the command line for logging

if [ -f path.sh ]; then . ./path.sh; fi
. parse_options.sh || exit 1;

if [ $# != 3 ]; then
   echo "usage: make_fbank.sh [options] <data-dir> <log-dir> <path-to-fbankdir>";
   echo "options: "
   echo "  --fbank-config <config-file>                      # config passed to compute-fbank-feats "
   echo "  --nj <nj>                                        # number of parallel jobs"
   echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
   exit 1;
fi

data=$1
logdir=$2
fbankdir=$3

# make $fbankdir an absolute pathname.
fbankdir=`perl -e '($dir,$pwd)= @ARGV; if($dir!~m:^/:) { $dir = "$pwd/$dir"; } print $dir; ' $fbankdir ${PWD}`

# use "name" as part of name of the archive.
name=`basename $data`

mkdir -p $fbankdir || exit 1;
mkdir -p $logdir || exit 1;

if [ -f $data/feats.scp ]; then
  mkdir -p $data/.backup
  echo "$0: moving $data/feats.scp to $data/.backup"
  mv $data/feats.scp $data/.backup
fi

scp=$data/wav.scp

required="$scp $fbank_config"

for f in $required; do
  if [ ! -f $f ]; then
    echo "make_fbank.sh: no such file $f"
    exit 1;
  fi
done

utils/validate_data_dir.sh --no-text --no-feats $data || exit 1;

if [ -f $data/spk2warp ]; then
  echo "$0 [info]: using VTLN warp factors from $data/spk2warp"
  vtln_opts="--vtln-map=ark:$data/spk2warp --utt2spk=ark:$data/utt2spk"
elif [ -f $data/utt2warp ]; then
  echo "$0 [info]: using VTLN warp factors from $data/utt2warp"
  vtln_opts="--vtln-map=ark:$data/utt2warp"
fi

for n in $(seq $nj); do
  # the next command does nothing unless $fbankdir/storage/ exists, see
  # utils/create_data_link.pl for more info.
  utils/create_data_link.pl $fbankdir/raw_fbank_$name.$n.ark  
done

if [ -f $data/segments ]; then
  echo "$0 [info]: segments file exists: using that."
  split_segments=""
  for n in $(seq $nj); do
    split_segments="$split_segments $logdir/segments.$n"
  done

  utils/split_scp.pl $data/segments $split_segments || exit 1;
  rm $logdir/.error 2>/dev/null

  $cmd JOB=1:$nj $logdir/make_fbank_${name}.JOB.log \
    extract-segments scp,p:$scp $logdir/segments.JOB ark:- \| \
    compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config ark:- ark:- \| \
    copy-feats --compress=$compress ark:- \
     ark,scp:$fbankdir/raw_fbank_$name.JOB.ark,$fbankdir/raw_fbank_$name.JOB.scp \
     || exit 208;

else
  echo "$0: [info]: no segments file exists: assuming wav.scp indexed by utterance."
  split_scps=""
  for n in $(seq $nj); do
    split_scps="$split_scps $logdir/wav.$n.scp"
  done

  utils/split_scp.pl $scp $split_scps || exit 1;

  $cmd JOB=1:$nj $logdir/make_fbank_${name}.JOB.log \
    compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config scp,p:$logdir/wav.JOB.scp ark:- \| \
    copy-feats --compress=$compress ark:- \
     ark,scp:$fbankdir/raw_fbank_$name.JOB.ark,$fbankdir/raw_fbank_$name.JOB.scp \
     || exit 209;

fi

if [ -f $logdir/.error.$name ]; then
  echo "Error producing fbank features for $name:"
  tail $logdir/make_fbank_${name}.1.log
  exit 1;
fi

# concatenate the .scp files together.
for n in $(seq $nj); do
  cat $fbankdir/raw_fbank_$name.$n.scp || exit 1;
done > $data/feats.scp

rm $logdir/wav.*.scp  $logdir/segments.* 2>/dev/null

nf=`cat $data/feats.scp | wc -l` 
nu=`cat $data/utt2spk | wc -l` 
if [ $nf -ne $nu ]; then
  echo "It seems not all of the feature files were successfully ($nf != $nu);"
  echo "consider using utils/fix_data_dir.sh $data"
fi

echo "Succeeded creating filterbank features for $name"

Is there some thing wrong? and what is out-moded and journalese mean?

LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx340-0000000-0000242 is 0.466467 over 240 frames.
mbns0_sx430-0000000-0000343 out-moded 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx430-0000000-0000343 is 0.430763 over 341 frames.
mbns0_sx70-0000000-0000119 journalese 
fmetze commented 8 years ago

Hi,

my guess is that you will need to reduce the number of parameters in the model - l=5 and c=320 are good settings for Switchboard and TEDLIUM, with hundreds of hours of training data, but not for TIMIT, with just a few. The difference in the ark files shows this (somewhat). The TIMIT speakers are much shorter than the TEDLIUM speakers, and therefore the sum and sum-of-squares of the data in the speaker is much smaller (which is what I think you’re showing). Finally, during decoding, you can see that the network likes to output “outmoded” and “journalese” for some reason. Presumably you are still using the TEDLIUM language model?

Do you know someone who is familiar with the Kaldi TIMIT recipe? I think you need to adapt the Eesen recipe a bit more for it to give good results, the Kaldi TIMIT recipe would probably be a good starting point to see what is being done.

Florian

On Jun 6, 2016, at 5:36 AM, zhangjiulong notifications@github.com wrote:

Hi I tested timit data using eesen, but the result is not good as follows:

training process

EPOCH 11 RUNNING ... ENDS [2016-Jun-6 17:02:47]: lrate 4e-05, TRAIN ACCURACY 23.4300%, VALID ACCURACY 17.3147% EPOCH 12 RUNNING ... ENDS [2016-Jun-6 17:07:02]: lrate 4e-05, TRAIN ACCURACY 25.2924%, VALID ACCURACY 16.1223% EPOCH 13 RUNNING ... ENDS [2016-Jun-6 17:11:18]: lrate 4e-05, TRAIN ACCURACY 26.1150%, VALID ACCURACY 18.4033% EPOCH 14 RUNNING ... ENDS [2016-Jun-6 17:15:33]: lrate 4e-05, TRAIN ACCURACY 26.6806%, VALID ACCURACY 19.5179% EPOCH 15 RUNNING ... ENDS [2016-Jun-6 17:19:51]: lrate 4e-05, TRAIN ACCURACY 27.1350%, VALID ACCURACY 18.6625% EPOCH 16 RUNNING ... ENDS [2016-Jun-6 17:24:07]: lrate 2e-05, TRAIN ACCURACY 27.4092%, VALID ACCURACY 20.1400% EPOCH 17 RUNNING ... ENDS [2016-Jun-6 17:28:23]: lrate 1e-05, TRAIN ACCURACY 27.5363%, VALID ACCURACY 20.2177% finished, too small rel. improvement .0777 Training succeeded. The final model exp/train_phn_l5_c320/final.nnet Removing features tmpdir exp/train_phn_l5_c320/ptrXL @ pingan-nlp-001 cv.ark train.ark testing process

rjb1_sx64-0000000-0000248 out-moded LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjb1_sx64-0000000-0000248 is 0.454562 over 246 frames. mrjh0_sa1-0000000-0000385 she had your dark suit in greasy wash water all LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa1-0000000-0000385 is 0.577131 over 383 frames. mrjh0_sa2-0000000-0000317 how ask me to carry an oily rag like LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa2-0000000-0000317 is 0.483511 over 315 frames. mrjh0_si1145-0000000-0000487 how unauthentic LOG (latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:294) Rebuilding repository. LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1145-0000000-0000487 is 0.258022 over 485 frames. mrjh0_si1775-0000000-0000306 how unauthentic LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1775-0000000-0000306 is 0.384129 over 304 frames. mrjh0_si515-0000000-0000296 out-moded LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si515-0000000-0000296 is 0.429838 over 294 frames. mrjh0_sx155-0000000-0000394 how unauthentic I checked the ark file of timit and tedlium data and I found some difference, but I do not know how the difference come. The tedlium's ark file like this :

AlGore_2009 [ 510340.6 586395.1 608272.1 621239.9 642546.4 653072.2 651401.9 651305.8 653922.6 659371.4 654681.1 652654.5 646230.6 645681.9 650887.6 655483.5 666377.6 671666.1 672115.6 669366.7 669373.2 681050.7 703447.4 715073.2 709013.8 702928.3 713154.4 718430.6 711170 688705.3 658752.9 641324.2 630078.5 628411.7 623944.6 627934.9 639849.6 641777.4 643522.4 627100.5 39020 6946354 9087419 9763794 1.018412e+07 1.091917e+07 1.127568e+07 1.123372e+07 1.124698e+07 1.134869e+07 1.154412e+07 1.137156e+07 1.1279e+07 1.104712e+07 1.103819e+07 1.121266e+07 1.137482e+07 1.174376e+07 1.193318e+07 1.195034e+07 1.184173e+07 1.18378e+07 1.224044e+07 1.303755e+07 1.345864e+07 1.321482e+07 1.298168e+07 1.337751e+07 1.358918e+07 1.332344e+07 1.25133e+07 1.146994e+07 1.091216e+07 1.056395e+07 1.05361e+07 1.041939e+07 1.053006e+07 1.088328e+07 1.093435e+07 1.097803e+07 1.044419e+07 0 ]

And the timit's is like this:

fadg0_sa1 [ 3077.437 3576.837 3893.808 4497.17 4646.433 4888.595 5084.933 5245.375 5266.312 5316.513 5304.906 5279.905 5159.947 5092.513 5093.656 5096.891 5198.106 5342.096 5525.816 5622.102 5590.077 5587.714 5621.955 5658.111 5640.733 5684.978 5922.412 6028.531 5843.909 5494.285 5123.665 4873.254 4768.456 4619.075 4454.212 4446.68 4533.783 4809.863 5073.438 5097.519 372 28369.65 38061.98 44509.9 59787.96 63547.87 70383.9 75846.95 80695.33 82071.57 83632.43 82730.72 81498.48 78174.86 76341.12 75977.55 75682.39 78059.57 82118.61 87383.28 90191.3 89340.34 89230.34 90614.35 91722.68 90768.06 91814.1 99787.2 103876.6 97762.07 85880.71 74550.43 67565.18 64682.43 60528.35 56254.4 56227.67 58352.52 65413.38 72421.16 72856.04 0 ] but the scripts is the same as the tedliums', (I just modified exist code),the diff runs like this:

91c91

< || exit 208;

 || exit 1;

106c106

< || exit 209;

 || exit 1;

and the scripts is like this:

!/bin/bash

Copyright 2012 Karel Vesely Johns Hopkins University (Author: Daniel Povey)

Apache 2.0

To be run from .. (one directory up from here)

see ../run.sh for example

Begin configuration section.

nj=4 cmd=run.pl fbank_config=conf/fbank.conf compress=true

End configuration section.

echo "$0 $@" # Print the command line for logging

if [ -f path.sh ]; then . ./path.sh; fi . parse_options.sh || exit 1;

if [ $# != 3 ]; then echo "usage: make_fbank.sh [options] "; echo "options: " echo " --fbank-config # config passed to compute-fbank-feats " echo " --nj # number of parallel jobs" echo " --cmd (utils/run.pl|utils/queue.pl ) # how to run jobs." exit 1; fi

data=$1 logdir=$2 fbankdir=$3

make $fbankdir an absolute pathname.

fbankdir=perl -e '($dir,$pwd)= @ARGV; if($dir!~m:^/:) { $dir = "$pwd/$dir"; } print $dir; ' $fbankdir ${PWD}

use "name" as part of name of the archive.

name=basename $data

mkdir -p $fbankdir || exit 1; mkdir -p $logdir || exit 1;

if [ -f $data/feats.scp ]; then mkdir -p $data/.backup echo "$0: moving $data/feats.scp to $data/.backup" mv $data/feats.scp $data/.backup fi

scp=$data/wav.scp

required="$scp $fbank_config"

for f in $required; do if [ ! -f $f ]; then echo "make_fbank.sh: no such file $f" exit 1; fi done

utils/validate_data_dir.sh --no-text --no-feats $data || exit 1;

if [ -f $data/spk2warp ]; then echo "$0 [info]: using VTLN warp factors from $data/spk2warp" vtln_opts="--vtln-map=ark:$data/spk2warp --utt2spk=ark:$data/utt2spk" elif [ -f $data/utt2warp ]; then echo "$0 [info]: using VTLN warp factors from $data/utt2warp" vtln_opts="--vtln-map=ark:$data/utt2warp" fi

for n in $(seq $nj); do

the next command does nothing unless $fbankdir/storage/ exists, see

utils/create_data_link.pl for more info.

utils/create_data_link.pl $fbankdir/rawfbank$name.$n.ark
done

if [ -f $data/segments ]; then echo "$0 [info]: segments file exists: using that." split_segments="" for n in $(seq $nj); do split_segments="$split_segments $logdir/segments.$n" done

utils/split_scp.pl $data/segments $split_segments || exit 1; rm $logdir/.error 2>/dev/null

$cmd JOB=1:$nj $logdir/makefbank${name}.JOB.log \ extract-segments scp,p:$scp $logdir/segments.JOB ark:- | \ compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config ark:- ark:- | \ copy-feats --compress=$compress ark:- \ ark,scp:$fbankdir/rawfbank$name.JOB.ark,$fbankdir/rawfbank$name.JOB.scp \ || exit 208;

else echo "$0: [info]: no segments file exists: assuming wav.scp indexed by utterance." split_scps="" for n in $(seq $nj); do split_scps="$split_scps $logdir/wav.$n.scp" done

utils/split_scp.pl $scp $split_scps || exit 1;

$cmd JOB=1:$nj $logdir/makefbank${name}.JOB.log \ compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config scp,p:$logdir/wav.JOB.scp ark:- | \ copy-feats --compress=$compress ark:- \ ark,scp:$fbankdir/rawfbank$name.JOB.ark,$fbankdir/rawfbank$name.JOB.scp \ || exit 209;

fi

if [ -f $logdir/.error.$name ]; then echo "Error producing fbank features for $name:" tail $logdir/makefbank${name}.1.log exit 1; fi

concatenate the .scp files together.

for n in $(seq $nj); do cat $fbankdir/rawfbank$name.$n.scp || exit 1; done > $data/feats.scp

rm $logdir/wav..scp $logdir/segments. 2>/dev/null

nf=cat $data/feats.scp | wc -l nu=cat $data/utt2spk | wc -l if [ $nf -ne $nu ]; then echo "It seems not all of the feature files were successfully ($nf != $nu);" echo "consider using utils/fix_data_dir.sh $data" fi

echo "Succeeded creating filterbank features for $name" Is there some thing wrong? and what is out-moded and journalese mean?

LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx340-0000000-0000242 is 0.466467 over 240 frames. mbns0_sx430-0000000-0000343 out-moded LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx430-0000000-0000343 is 0.430763 over 341 frames. mbns0_sx70-0000000-0000119 journalese — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/59, or mute the thread https://github.com/notifications/unsubscribe/AEnA8QBSiH3oS5B64Y-BZpqPWKpzoH38ks5qI-oDgaJpZM4IuuED.

Florian Metze http://www.cs.cmu.edu/directory/florian-metze Associate Research Professor Carnegie Mellon University

zhangjiulong commented 8 years ago

Hi @fmetze , thanks for your suggestion, I will try it. But the language model I used is built from timit text. And the result is very strange

yajiemiao commented 8 years ago

Word-based language model built on TIMIT is relatively weak. I recommend you to compose a phone language model. You plug a fake dictionary which simply contains the duplicates of phones: A A B B ....

zhangjiulong commented 8 years ago

@yajiemiao do you means test the phones eesen recognized, not the word ?

yajiemiao commented 8 years ago

yep My very first verification of EESEN was done on TIMIT. I was able to get reasonable (if not state-of-the-art) phone error rates

zhangjiulong commented 8 years ago

@yajiemiao ok thanks very much.

double22a commented 8 years ago

@fmetze Hi, fmetze With EESEN, do you run some experiments based on Uni-LSTM? About Uni-LSTM, my results are terrible.

fmetze commented 8 years ago

We have not run such experiments. I think there is some work on how to build uni-directional LSTMs that work for speech (mainly stacking future frames rather than relying on the RNN to learn them), or decompose the sentence BiLSTM into a series of shorter BiLSTMs that one can evaluate quickly,but we have not implemented any of this in Eesen. Would be a great feature, though ;-)

On Aug 29, 2016, at 10:07 PM, baylor0118 notifications@github.com wrote:

@fmetze https://github.com/fmetze Hi, fmetze With EESEN, do you run some experiments based on Uni-LSTM? About Uni-LSTM, my results are terrible.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/59#issuecomment-243312459, or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8fTDVtVbNLEhzQYGU8Q5IpeY3ZDsks5qk5BKgaJpZM4IuuED.

Florian Metze http://www.cs.cmu.edu/directory/florian-metze Associate Research Professor Carnegie Mellon University

yajiemiao commented 8 years ago

In general, CTC highly depends on BiLSTM for reasonable performance. If you refer to http://www.cs.cmu.edu/~ymiao/pub/icassp2016_ctc.pdf, on Switchboard, Uni-directional models perform >15% worse than Bi-directional models, with the same number of model parameters.

Aasimrafique commented 8 years ago

@yajiemiao @zhangjiulong can you please share the example tested with TIMIT dataset?

zhangjiulong commented 8 years ago

I just convert timit format to stm format and runns using tedlium scripts.

razor1179 commented 7 years ago

@Aasimrafique, were you able to convert the TIMIT format to STM format as instructed by @zhangjiulong? If so could you please share how you did it exactly. @yajiemiao , @fmetze, it would be very helpful if you could share TIMIT dataset test.

Thanks.

riebling commented 7 years ago

it would be very helpful if you could share TIMIT dataset test.

Unfortunately, as mentioned in Wikipedia: TIMIT and NTIMIT are not freely available — either membership of the Linguistic Data Consortium, or a monetary payment, is required for access to the dataset. We are not permitted to distribute TIMIT data.

razor1179 commented 7 years ago

@riebling I forgot to add in scripts in the end, I do have access to the TIMIT dataset and what I meant to ask was if the TIMIT dataset test scripts could be shared.

riebling commented 7 years ago

Oops, my misunderstanding. My best guess is that at least here at CMU, there is no TIMIT Eesen experiment to share. The only person that seems to have tried this (aside from Yajie, who is no longer with us) is @zhangjiulong

Florian suggests people try adapting Kaldi TIMIT experiment. This does not imply he has done so

We have not run such experiments.

or therefore has any scripts to share.

razor1179 commented 7 years ago

@riebling Okay, I see. But I did create a new issue here https://github.com/srvk/eesen/issues/128, describing what I've done and the issues I am facing. Could you please suggest how I could move forward?