Closed Tortoise17 closed 2 years ago
should be on your path variable if you source path.sh (what run.sh does too). Run:
source path.sh
or
. path.sh
in the s5_r2 directory if you are copy pasting the commands manually
Am Do., 1. Apr. 2021 um 16:15 Uhr schrieb Tortoise17 < @.***>:
I am now facing error.
local/build_lm.sh --srcdir data/local/lang_std_big_v5 --dir data/local/lm_std_big_v5 --lmstage 2 Not installing the kaldi_lm toolkit since it is already there. You need to have kaldi_lm on your path
Can you guide me which path is it. I have already set the path as
s5/path.sh export KALDI_LM=$KALDI_ROOT/tools/kaldi_lm
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/kaldi-tuda-de/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKGA6XDFYJSKXY5VU2TYZTTGR5ZFANCNFSM42HFXUXA .
note that the s5 directory contains a super old recipe, you should use everything from s5_r2 and ignore s5
Am Do., 1. Apr. 2021 um 16:22 Uhr schrieb Ben M @.***>:
should be on your path variable if you source path.sh (what run.sh does too). Run:
source path.sh
or
. path.sh
in the s5_r2 directory if you are copy pasting the commands manually
Am Do., 1. Apr. 2021 um 16:15 Uhr schrieb Tortoise17 < @.***>:
I am now facing error.
local/build_lm.sh --srcdir data/local/lang_std_big_v5 --dir data/local/lm_std_big_v5 --lmstage 2 Not installing the kaldi_lm toolkit since it is already there. You need to have kaldi_lm on your path
Can you guide me which path is it. I have already set the path as
s5/path.sh export KALDI_LM=$KALDI_ROOT/tools/kaldi_lm
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/kaldi-tuda-de/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKGA6XDFYJSKXY5VU2TYZTTGR5ZFANCNFSM42HFXUXA .
That is still there and sourced as well. and as I assume this is folder which contains language model builder exe files? if yes, they are there I used all files from s5_r2 and just renamed fodler as s5. Still it is so.
it is halting at this stage. Does this mean the run.sh process finished? or something else. and if something else, how to get it fixed?
Probably related to the mp3 plugin that you need for sox. Please check if your sox supports mp3
On Thu, Apr 8, 2021, 11:27 AM Tortoise17 @.***> wrote:
- x=commonvoice_train
- utils/fix_data_dir.sh data/commonvoice_train fix_data_dir.sh: no utterances remained: not proceeding further.
it is halting at this stage. Does this mean the run.sh process finished? or something else. and if something else, how to get it fixed?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/kaldi-tuda-de/issues/53#issuecomment-815606089, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKGA6QM6CIFITHD7IWUFADTHVZGZANCNFSM42HFXUXA .
Is there any other way like ffmpeg
or any other command which can be used instead of sox? which can be handled at the data prep / run time at mfcc ? I tried at one place but was not successful. any hint?
I managed with your help. I am now stuck with this error.
++ wc -l
+ n=851122
+ utils/subset_data_dir.sh --last data/train 851122 data/train_nodev
utils/subset_data_dir.sh: reducing #utt from 855122 to 851122
+ utils/subset_data_dir.sh --shortest data/train_nodev 150000 data/train_100kshort
feat-to-len scp:data/train_nodev/feats.scp ark,t:data/train_100kshort/tmp.len
ERROR (feat-to-len[5.5.903~1-6260b]:Read():kaldi-matrix.cc:1620) Failed to read matrix from stream. : Expected "[", got "�����������ФS4..." File position at start is 10702, currently 10757
[ Stack-Trace: ]
feat-to-len(kaldi::MessageLogger::LogMessage() const+0x76b) [0x4a739f]
feat-to-len(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x42c9d3]
feat-to-len(kaldi::Matrix<float>::Read(std::istream&, bool, bool)+0x1eb2) [0x473e78]
feat-to-len(kaldi::SequentialTableReaderScriptImpl<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Value()+0x15c) [0x4304a0]
feat-to-len(kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Value()+0x12) [0x4311dc]
feat-to-len(main+0x128) [0x42bd4a]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fb131b30555]
feat-to-len() [0x42bb79]
WARNING (feat-to-len[5.5.903~1-6260b]:Read():util/kaldi-holder-inl.h:84) Exception caught reading Table object. kaldi::KaldiFatalError
WARNING (feat-to-len[5.5.903~1-6260b]:EnsureObjectLoaded():util/kaldi-table-inl.h:317) Failed to load object from /home/user/Desktop/workshop/lab_work/stt/asr/kaldi/egs/csj/s5/mfcc/raw_mfcc_swc_train.19.ark:10702
ERROR (feat-to-len[5.5.903~1-6260b]:Value():util/kaldi-table-inl.h:164) Failed to load object from /home/user/Desktop/workshop/lab_work/stt/asr/kaldi/egs/csj/s5/mfcc/raw_mfcc_swc_train.19.ark:10702 (to suppress this error, add the permissive (p, ) option to the rspecifier.
[ Stack-Trace: ]
feat-to-len(kaldi::MessageLogger::LogMessage() const+0x76b) [0x4a739f]
feat-to-len(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x42c9d3]
feat-to-len(kaldi::SequentialTableReaderScriptImpl<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Value()+0x90f) [0x430c53]
feat-to-len(kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix<float> > >::Value()+0x12) [0x4311dc]
feat-to-len(main+0x128) [0x42bd4a]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fb131b30555]
feat-to-len() [0x42bb79]
I have gcc 8.3 cuda 10.2 CentOS 7.6
Can you guide me what and why is this ? or how to resolve this?
Something went wrong and you probably have feats.scp files / .ark files that aren't matched and are from different feature extraction runs. I suggest deleting all mfcc features and to regenerate them.
Am Di., 13. Apr. 2021 um 13:58 Uhr schrieb Tortoise17 < @.***>:
I managed with your help. I am now stuck with this error.
++ wc -l
n=851122
utils/subset_data_dir.sh --last data/train 851122 data/train_nodev
utils/subset_data_dir.sh: reducing #utt from 855122 to 851122
- utils/subset_data_dir.sh --shortest data/train_nodev 150000 data/train_100kshort
feat-to-len scp:data/train_nodev/feats.scp ark,t:data/train_100kshort/tmp.len
ERROR (feat-to-len[5.5.903~1-6260b]:Read():kaldi-matrix.cc:1620) Failed to read matrix from stream. : Expected "[", got "�����������ФS4��..." File position at start is 10702, currently 10757
[ Stack-Trace: ]
feat-to-len(kaldi::MessageLogger::LogMessage() const+0x76b) [0x4a739f]
feat-to-len(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x42c9d3]
feat-to-len(kaldi::Matrix
::Read(std::istream&, bool, bool)+0x1eb2) [0x473e78] feat-to-len(kaldi::SequentialTableReaderScriptImpl<kaldi::KaldiObjectHolder<kaldi::Matrix
> >::Value()+0x15c) [0x4304a0] feat-to-len(kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix
> >::Value()+0x12) [0x4311dc] feat-to-len(main+0x128) [0x42bd4a]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fb131b30555]
feat-to-len() [0x42bb79]
WARNING (feat-to-len[5.5.903~1-6260b]:Read():util/kaldi-holder-inl.h:84) Exception caught reading Table object. kaldi::KaldiFatalError
WARNING (feat-to-len[5.5.903~1-6260b]:EnsureObjectLoaded():util/kaldi-table-inl.h:317) Failed to load object from /home/user/Desktop/workshop/lab_work/stt/asr/kaldi/egs/csj/s5/mfcc/raw_mfcc_swc_train.19.ark:10702
ERROR (feat-to-len[5.5.903~1-6260b]:Value():util/kaldi-table-inl.h:164) Failed to load object from /home/user/Desktop/workshop/lab_work/stt/asr/kaldi/egs/csj/s5/mfcc/raw_mfcc_swc_train.19.ark:10702 (to suppress this error, add the permissive (p, ) option to the rspecifier.
[ Stack-Trace: ]
feat-to-len(kaldi::MessageLogger::LogMessage() const+0x76b) [0x4a739f]
feat-to-len(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x42c9d3]
feat-to-len(kaldi::SequentialTableReaderScriptImpl<kaldi::KaldiObjectHolder<kaldi::Matrix
> >::Value()+0x90f) [0x430c53] feat-to-len(kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix
> >::Value()+0x12) [0x4311dc] feat-to-len(main+0x128) [0x42bd4a]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fb131b30555]
feat-to-len() [0x42bb79]
I have gcc 8.3 cuda 10.2 CentOS 7.6
Can you guide me what and why is this ? or how to resolve this?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/kaldi-tuda-de/issues/53#issuecomment-818678229, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKGA6SGHRC7H27YXBE3QWDTIQWWFANCNFSM42HFXUXA .
Thank you . I think I am facing same issue. like https://github.com/uhh-lt/kaldi-tuda-de/issues/43
# utils/mkgraph.sh data/lang_std_big_v5_test exp/tri1 exp/tri1/graph_nosp
# Started at Wed Apr 14 02:33:23 CEST 2021
#
tree-info exp/tri1/tree
tree-info exp/tri1/tree
fsttablecompose data/lang_std_big_v5_test/L_disambig.fst data/lang_std_big_v5_test/G.fst
fstpushspecial
fstminimizeencoded
fstdeterminizestar --use-log=true
ERROR: FstHeader::Read: Bad FST header: data/lang_std_big_v5_test/G.fst
ERROR (fsttablecompose[5.5.903~1-6260b]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from data/lang_std_big_v5_test/G.fst
[ Stack-Trace: ]
fsttablecompose(kaldi::MessageLogger::LogMessage() const+0x76b) [0x4f2953]
fsttablecompose(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x45a373]
fsttablecompose(fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x198) [0x47f13e]
fsttablecompose(main+0x6ed) [0x456f6f]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f55111e7555]
fsttablecompose() [0x4567d9]
kaldi::KaldiFatalErrorERROR: FstHeader::Read: Bad FST header: -
ERROR (fstdeterminizestar[5.5.903~1-6260b]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input
[ Stack-Trace: ]
fstdeterminizestar(kaldi::MessageLogger::LogMessage() const+0x76b) [0x4e381b]
fstdeterminizestar(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x449041]
fstdeterminizestar(fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x198) [0x470e7d]
fstdeterminizestar(main+0x2b9) [0x447596]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f43f393b555]
fstdeterminizestar() [0x447229]
kaldi::KaldiFatalErrorERROR: FstHeader::Read: Bad FST header: -
ERROR (fstminimizeencoded[5.5.903~1-6260b]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input
[ Stack-Trace: ]
fstminimizeencoded(kaldi::MessageLogger::LogMessage() const+0x76b) [0x4cbdf5]
fstminimizeencoded(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x455701]
fstminimizeencoded(fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x198) [0x4540d1]
fstminimizeencoded(main+0x125) [0x43f2d7]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fe721ae4555]
fstminimizeencoded() [0x43f109]
kaldi::KaldiFatalErrorERROR: FstHeader::Read: Bad FST header: -
ERROR (fstpushspecial[5.5.903~1-6260b]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input
[ Stack-Trace: ]
fstpushspecial(kaldi::MessageLogger::LogMessage() const+0x76b) [0x4b3a9d]
fstpushspecial(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x43a78f]
fstpushspecial(fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x198) [0x43980d]
fstpushspecial(main+0x125) [0x4354b7]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff2d5bbf555]
fstpushspecial() [0x4352e9]
kaldi::KaldiFatalError# Accounting: time=1 threads=1
# Ended (code 1) at Wed Apr 14 02:33:24 CEST 2021, elapsed time 1 seconds
Do I have to redo the steps with your new method list?
My guess is that data/lang_std_big_v5_test/G.fst is empty and 0 bytes, can you check?
My guess is that data/lang_std_big_v5_test/G.fst is empty and 0 bytes, can you check?
Yes, it is empty. I am confused. why is it empty?
That means something went in the FST generation and/or ARPA LM training. You should first check if an ARPA LM file has been successfully created. Unfortunately the error handling for failures isn't good - maybe we can think of ways to improve this.
Maybe @Alienmaster can comment on this, since he had the same problem. What was the solution?
FYI: We have merged the new recipe with 1700h of audio data, I recommend upgrading but unfortunately you will probably need to start from a fresh copy.
I am now facing error.
Can you guide me which path is it. I have already set the path as