srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
822 stars 343 forks source link

Gentle + Eesen for forced alignment #88

Closed migueljette closed 8 years ago

migueljette commented 8 years ago

Hi @fmetze and @yajiemiao, I was wondering if you know of any project that combines Eesen with Gentle for forced alignment. Gentle seems very cool, but I really want to use Eesen for training my models: https://lowerquality.com/gentle/

Thanks for your guidance! Love you scripts! Keep up the amazing work!

fmetze commented 8 years ago

Interesting use case.

We do have code for generating alignments with Eesen (can check it in if there is interest), but the problem (and the beauty) is that CTC does not produce a dense partitioning of the audio, but only peaks. So classical “alignment” with begin and end times is hard to do, at the minimum timings will be wildly inexact. But you should still be able to replace some parts of Kaldi in Gentle with Eesen, if you like, and work like that, not?

On Aug 23, 2016, at 10:48 AM, Mig notifications@github.com wrote:

Hi @fmetze https://github.com/fmetze and @yajiemiao https://github.com/yajiemiao, I was wondering if you know of any project that combines Eesen with Gentle for forced alignment. Gentle seems very cool, but I really want to use Eesen for training my models: https://lowerquality.com/gentle/ https://lowerquality.com/gentle/ Thanks for your guidance! Love you scripts! Keep up the amazing work!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/88, or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8RBgTnj4juTToGfzTqP-F8RIW1pHks5qiwhDgaJpZM4JrAuz.

Florian Metze http://www.cs.cmu.edu/directory/florian-metze Associate Research Professor Carnegie Mellon University

migueljette commented 8 years ago

Hi! Yes, I could certainly replace some parts of Kaldi in Gentle with Eesen. I am not super familiar with all the code (yet!) so it would take quite a while I am sure. I'll keep exploring, but for the time being, I think I will use Kaldi for my forced alignment. But I will make sure to give Eesen a try for the offline transcriber we are writing (i know you published a nice one, so i'll have a look!).

For fun, how could I get my hands on the code for generating alignments with Eesen? While digging I found this code from @yajiemiao: https://github.com/yajiemiao/eesen/commit/5e7880f1f15d5c50899785419ecb7e26a786db3a

Is that what you mean?

fmetze commented 8 years ago

Great, yes - that is the code. Eric (in cc) is also working with it right now and could maybe help, if needed. Let us know if you need anything else.

F.

On Aug 26, 2016, at 3:39 PM, Mig notifications@github.com wrote:

Hi! Yes, I could certainly replace some parts of Kaldi in Gentle with Eesen. I am not super familiar with all the code (yet!) so it would take quite a while I am sure. I'll keep exploring, but for the time being, I think I will use Kaldi for my forced alignment. But I will make sure to give Eesen a try for the offline transcriber we are writing (i know you published a nice one, so i'll have a look!).

For fun, how could I get my hands on the code for generating alignments with Eesen? While digging I found this code from @yajiemiao https://github.com/yajiemiao: yajiemiao@5e7880f https://github.com/yajiemiao/eesen/commit/5e7880f1f15d5c50899785419ecb7e26a786db3a Is that what you mean?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/88#issuecomment-242831652, or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8TcGwIRvuVlcNKD7TlSfVHcgY0ZZks5qj0DfgaJpZM4JrAuz.

Florian Metze http://www.cs.cmu.edu/directory/florian-metze Associate Research Professor Carnegie Mellon University

migueljette commented 8 years ago

Hi @fmetze! Thank you for your responses. I will take another look as soon as I have all my data preparation done. I have a lot of work ahead of me. :)

riebling commented 8 years ago

There is some new code in the Eesen transcriber VM that extends Yajie's code, above. It is designed to operate on data containing several utterances, aligning them one by one. Think of it as a wrapper script for align_ctc_single_utt.sh within the framework of the Eesen offline transcriber : https://github.com/srvk/srvk-eesen-offline-transcriber/blob/master/align.sh

migueljette commented 7 years ago

Hi @riebling I'm just getting back to this now. Do you have an example of this working? I'm trying to set it up with my own data and I haven't been able yet. I probably have all sorts of problems with my setup, that's why I'm wondering if you have a "toy example" i could use. Thanks! This is promising! In fact,a "before" and "after" would be amazing (i want to see the alignment it produces for a short example). Cheers,

riebling commented 7 years ago

If you happen to have test2.txt in the start folder (which appears as /vagrant/test2.txt) - I just added one to GitHub, but you could create by pasting this text:

things will change in ways that their fragile environment simply can't support and that leads to starvation it leads to uncertainty it leads to unrest so the the climate changes will be terrible for them

then from the VM, you could do something like this

cd ~/bin ./align.sh /vagrant/test2.mp3

That should produce (in addition to a bunch of log output) a file in build/output/test2.ali:

test2-1---0000.000-0006.460 1 0 0.81 things test2-1---0000.000-0006.460 1 0.81 0.21 will test2-1---0000.000-0006.460 1 1.02 0.48 change test2-1---0000.000-0006.460 1 1.50 0.15 in test2-1---0000.000-0006.460 1 1.65 0.84 ways test2-1---0000.000-0006.460 1 2.49 0.15 that test2-1---0000.000-0006.460 1 2.64 0.27 their test2-1---0000.000-0006.460 1 2.91 0.57 fragile test2-1---0000.000-0006.460 1 3.48 1.02 environment test2-1---0000.000-0006.460 1 4.50 0.42 simply test2-1---0000.000-0006.460 1 4.92 0.33 can't test2-1---0000.000-0006.460 1 5.25 1.20 support test2-1---0006.460-0010.690 1 6.46 0.12 and test2-1---0006.460-0010.690 1 6.58 0.18 that test2-1---0006.460-0010.690 1 6.76 0.21 leads test2-1---0006.460-0010.690 1 6.97 0.18 to test2-1---0006.460-0010.690 1 7.15 0.72 starvation test2-1---0006.460-0010.690 1 7.87 0.12 it test2-1---0006.460-0010.690 1 7.99 0.27 leads test2-1---0006.460-0010.690 1 8.26 0.06 to test2-1---0006.460-0010.690 1 8.32 0.66 uncertainty test2-1---0006.460-0010.690 1 8.98 0.12 it test2-1---0006.460-0010.690 1 9.10 0.27 leads test2-1---0006.460-0010.690 1 9.37 0.03 to test2-1---0006.460-0010.690 1 9.40 1.29 unrest test2-1---0010.690-0014.340 1 10.69 0.27 so test2-1---0010.690-0014.340 1 10.96 0.27 the test2-1---0010.690-0014.340 1 11.23 0.18 the test2-1---0010.690-0014.340 1 11.41 0.45 climate test2-1---0010.690-0014.340 1 11.86 0.63 changes test2-1---0010.690-0014.340 1 12.49 0.18 will test2-1---0010.690-0014.340 1 12.67 0.18 be test2-1---0010.690-0014.340 1 12.85 0.39 terrible test2-1---0010.690-0014.340 1 13.24 0.18 for test2-1---0010.690-0014.340 1 13.42 0.90 them

On 11/03/2016 03:01 PM, Mig wrote:

Hi @riebling https://github.com/riebling I'm just getting back to this now. Do you have an example of this working? I'm trying to set it up with my own data and I haven't been able yet. I probably have all sorts of problems with my setup, that's why I'm wondering if you have a "toy example" i could use. Thanks! This is promising! In fact,a "before" and "after" would be amazing (i want to see the alignment it produces for a short example). Cheers,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/88#issuecomment-258242016, or mute the thread https://github.com/notifications/unsubscribe-auth/ACX11vOpKodjfNDwUIzb1vaPfF13Imu5ks5q6i96gaJpZM4JrAuz.

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

migueljette commented 7 years ago

Hi @riebling Thanks for the help and showing me the output. That's very helpful. I really want to use eesen in my project, so I will try to make this work on my end in the next few days. For the moment, i updated and got your "test2.txt" file, but I get a few errors when testing your command.

First, it complains about some files being modified in the future (VM issue i guess...) and it also complains about test2.stm that doesn't exist.

bash align.sh /vagrant/test2.mp3
make: Warning: File `src-audio/test2.mp3' has modification time 0.0029 s in the future
mkdir -p `dirname build/audio/base/test2.wav`
sox src-audio/test2.mp3 -c 1 build/audio/base/test2.wav rate -v 16k
make: warning:  Clock skew detected.  Your build may be incomplete.
cat: /vagrant/test2.stm: No such file or directory
rm -rf build/audio/segmented/test2
mkdir -p build/audio/segmented/test2
cat build/diarization/test2/show.seg | cut -f 3,4,8 -d " " | \
        while read LINE ; do \
            start=`echo $LINE | cut -f 1 -d " " | perl -npe '$_=$_/100.0'`; \
            len=`echo $LINE | cut -f 2 -d " " | perl -npe '$_=$_/100.0'`; \
            sp_id=`echo $LINE | cut -f 3 -d " "`; \
            timeformatted=`echo "$start $len" | perl -ne '@t=split(); $start=$t[0]; $len=$t[1]; $end=$start+$len; printf("%08.3f-%08.3f\n", $start,$end);'` ; \
            if [ ${sp_id} == 'A' ]; then \
                sox build/audio/base/test2.wav -c 1  build/audio/segmented/test2/test2_${timeformatted}_${sp_id}.wav trim $start $len remix 1; \
            elif [ ${sp_id} == 'B' ]; then \
                sox build/audio/base/test2.wav -c 1  build/audio/segmented/test2/test2_${timeformatted}_${sp_id}.wav trim $start $len remix 2; \
            else \
                sox build/audio/base/test2.wav --norm build/audio/segmented/test2/test2_${timeformatted}_${sp_id}.wav trim $start $len; \
            fi \
        done
mkdir -p `dirname build/trans/test2/wav.scp`
/bin/ls build/audio/segmented/test2/*.wav  | \
            perl -npe 'chomp; $orig=$_; s/.*\/(.*)_(\d+\.\d+-\d+\.\d+)_(.*)\.wav/\1-\3---\2/; $_=$_ .  " $orig\n";' | LC_ALL=C sort > build/trans/test2/wav.scp
/bin/ls: cannot access build/audio/segmented/test2/*.wav: No such file or directory
cat build/trans/test2/wav.scp | perl -npe 's/\s+.*//; s/((.*)---.*)/\1 \2/' > build/trans/test2/utt2spk
utils/utt2spk_to_spk2utt.pl build/trans/test2/utt2spk > build/trans/test2/spk2utt
rm -rf build/trans/test2/fbank
steps/make_fbank.sh --fbank-config conf/fbank.16k.conf --cmd "$train_cmd" --nj 1 \
            build/trans/test2 build/trans/test2/exp/make_fbank build/trans/test2/fbank || exit 1
steps/make_fbank.sh --fbank-config conf/fbank.16k.conf --cmd run.pl --nj 1 build/trans/test2 build/trans/test2/exp/make_fbank build/trans/test2/fbank
utils/validate_data_dir.sh: empty file spk2utt
make: *** [build/trans/test2/fbank] Error 1
Aligning text found at /vagrant/test2.txt
local/align_ctc_multi_utts.sh --acoustic_scale 0.8 /home/vagrant/eesen/asr_egs/tedlium/v2-30ms/data/lang_phn_test_test_newlm /home/vagrant/eesen/asr_egs/tedlium/v2-30ms/data/lang_phn_test_test_newlm build/trans/test2 /home/vagrant/eesen/asr_egs/tedlium/v2-30ms/exp/train_phn_l5_c320_v1s build/trans/test2/align
local/align_ctc_multi_utts.sh: no such file build/trans/test2/feats.scp
cp: cannot stat 'build/trans/test2/align/ali': No such file or directory

Probably something wrong with my setup. I'm relatively new to vagrant and VMs, so maybe I should start from scratch to make sure everything is properly up-to-date.

Thanks for your guidance!

riebling commented 7 years ago

I think that's probably a fault in the align.sh script, namely that it was only (initially) written to be run from the working directory where Eesen transcriber resides. When running from outside the VM via "vagrant ssh -c" the paths are different. I'll try and submit a fix when I get into work :)

Thanks for helping test this out

On 11/03/2016 10:02 PM, Mig wrote:

Hi @riebling https://github.com/riebling Thanks for the help and showing me the output. That's very helpful. I really want to use eesen in my project, so I will try to make this work on my end in the next few days. For the moment, i updated and got your "test2.txt" file, but I get an error regarding "test2.stm":

bin/align.sh: line 28: utils/parse_options.sh: No such file or directory make: *\ No rule to make target build/audio/base/test2.wav'. Stop. cat: /vagrant/test2.stm: No such file or directory make: *\ No rule to make target build/trans/test2/fbank'. Stop. Aligning text found at /vagrant/test2.txt bin/align.sh: line 78: build/trans/test2/text: No such file or directory cp: cannot create regular file 'build/trans/test2': No such file or directory bin/align.sh: line 88: local/align_ctc_multi_utts.sh: No such file or directory cp: cannot stat 'build/trans/test2/align/ali': No such file or directory Connection to 127.0.0.1 closed.

Probably something wrong with my setup. I'm relatively new to vagrant and VMs, so maybe I should start from scratch to make sure everything is properly up-to-date.

Thanks for your guidance!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/88#issuecomment-258326646, or mute the thread https://github.com/notifications/unsubscribe-auth/ACX11to-XIFfOCc0qXYjNxvFiaWm6E1jks5q6pI3gaJpZM4JrAuz.

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

migueljette commented 7 years ago

hi, i'm happy to test this out! I'll wait and see what you think. Like I said, I might re-install everything following the instructions more carefully. That might help. I installed everything a few months ago, so it might be mismatch now. Thanks for your help too!

riebling commented 7 years ago

So this has been fixed not only to work from outside the VM (or other paths within the VM) but also that the aligner now produces complete outputs for more utterances. (A beam setting had been pruning some, resulting in partial transcripts and alignment outputs)