Open aolney opened 5 years ago
The problem seemed to be in the Makefile. Instead of using the STM, it was running LIUM. Below is my align.sh that seems to have fixed this problem:
#!/bin/bash
# Copyright 2016 er1k
# Apache 2.0
# Prepare data for, and run align_ctc_utts.sh script that generates word-level alignments
# in an "Eesen Transccriber-centric" way output is found in build/output/<basename>.ali
# Required inputs:
#
# * a 'hypothesis' text file for which to compute alignments, extension .txt
# one utterance per line. If no hypothesis text is found, text
# is obtained from the STM file below
# * an STM file with utterance/segment timings - 'perfect' transcription
# * an audio file, extension can vary (.mp3, .wav, .mp4 etc)
BASEDIR=$(dirname $0)
EESEN_ROOT=~/eesen
# Change these if you're using different models
#GRAPH_DIR=$EESEN_ROOT/asr_egs/tedlium/v2-30ms/data/lang_phn_test_test_newlm
GRAPH_DIR=$EESEN_ROOT/asr_egs/tedlium/v2-30ms/data/lang_phn_test
MODEL_DIR=$EESEN_ROOT/asr_egs/tedlium/v2-30ms/exp/train_phn_l5_c320_v1s
# Defaults
frame_shift=0.03 # 30 ms frames
lm_weight=0.8 # same as best setting for 30ms eesen tedlium transcriber
. path.sh
. $BASEDIR/utils/parse_options.sh
filename=$(basename "$1")
basename="${filename%.*}"
dirname=$(dirname "$1")
extension="${filename##*.}"
cd $BASEDIR
echo "In $BASEDIR"
if [ $# -ne 1 ]; then
echo "Usage: align.sh <basename>.{wav,mp3,mp4,sph}"
echo " in same folder is test text named <basename>.txt"
echo " and STM file named <basename>.stm (for segments)"
echo " ./align.sh /vagrant/GaryFlake_2010.wav"
echo " output is build/output/<basename>.ali"
exit 1;
fi
mkdir -p $BASEDIR/build/audio/base $BASEDIR/build/output
# un-shorten-ify SPH files
#if [ $extension == "sph" ]; then
# sph2pipe $1 > build/audio/base/$basename.unshorten
# sox build/audio/base/$basename.unshorten -c 1 build/audio/base/$basename.wav rate -v 16k
#fi
mkdir -p $BASEDIR/src-audio
cp $1 $BASEDIR/src-audio
#prefixing with BASEDIR throws off make rule?
#make $BASEDIR/build/audio/base/$basename.wav
make build/audio/base/$basename.wav
# 8k
# sox $1 -c 1 -e signed-integer build/audio/base/$basename.wav rate -v 8k
mkdir -p $BASEDIR/build/diarization/$basename
# make STM from cha
if [ -f $dirname/$basename.cha -a ! -f $dirname/$basename.stm ]; then
local/cha2stm.sh $dirname/$basename.cha | sed 's/xxx/\<unk\>/g' > build/output/$basename.stm
elif [ -f $dirname/$basename.stm ]; then
cp $dirname/$basename.stm build/output/
elif [ ! -f $dirname/$basename.stm ]; then
echo "Needs either a .cha or .stm file to get utterances"
exit 1
fi
#if [ ! -f $dirname/$basename.txt ]; then
# echo "Needs .txt file with utterance per line as reference text to align"
# exit 1
#fi
# make segments from $1.stm
cat build/output/$basename.stm | grep -v ';;' | grep -v "inter_segment_gap" | grep -v "ignore_time_segment_in_scoring" | awk '{OFMT = "%.0f"; print $1,$2,$4*100,($5-$4)*100,"M S U",$2}' > build/diarization/$basename/show.seg
# Generate features
cd $BASEDIR
rm -rf build/trans/$basename
make SEGMENTS=show.seg build/trans/$basename/fbank
# Expect test text in format with utterance IDs per line
uttdata=build/trans/$basename
#if [ -f $dirname/$basename.txt ];
# then
# echo "Aligning text found at $dirname/$basename.txt"
# cat $dirname/$basename.txt | awk '{print NR" "$0}' > $uttdata/text
# else
echo "Aligning text found in build/output/$basename.stm"
cat build/output/$basename.stm | awk '{$1="";$2="";$3="";$4="";$5="";$6=""; print NR$0}' \
| sed 's/ \+/ /' > $uttdata/text
#fi
cp build/diarization/$basename/show.seg $uttdata
#local/align_ctc_multi_utts.sh --acoustic_scale 0.8 $GRAPH_DIR $GRAPH_DIR $uttdata $MODEL_DIR $uttdata/align
# <langdir> <data> <uttdata> <mdldir> <dir>
local/align_ctc_multi_utts.sh --acoustic_scale $lm_weight $GRAPH_DIR $GRAPH_DIR $uttdata $MODEL_DIR $uttdata/align
# Copy results to someplace useful
cp $uttdata/align/ali build/output/$basename.ali
will need to look into this some other time, please let me know if you have other information or updates
Only that once the STM was properly used, the doubling issue went away. However, the alignments still seemed off.
align.sh
is doubling output, and the times are way off. Here is the STM, which was generated from the SRT subtitles (CC) from FFMPEG:Here is the
ali
fileAny suggestions would be appreciated. Regular ASR functionality (with kaldi) is working fine. FWIW my
steps
andutils
are linked to kaldi and not to eesen.