open-speech / speech-aligner

speech-aligner,是一个从“人声语音”及其“语言文本”,产生音素级别时间对齐标注的工具。speech-aligner, is a tool that generate phoneme-level alignment between human speech and its transcription
Other
392 stars 103 forks source link

Did not successfully decode file BAC009S0002W0125, len = 629 #8

Open haha010508 opened 5 years ago

haha010508 commented 5 years ago

这是个什么问题?

lidianxiang commented 4 years ago

同问,也遇到这么个问题 image

HW140701 commented 4 years ago

我也是遇到这个问题

lidianxiang commented 4 years ago

@HW140701 对于这个问题,我在montreal-forced-aligner中看到过一个解决办法:逐渐调大beam的值,直至合适为止,可以得到textgrid文件

WhiteFu commented 4 years ago

I install the repo successful, but I meet the error as follows. when use it. Do you know how to solve it?

/bin/speech-aligner --config=egs/cn_phn/conf/align.conf egs/cn_phn/data/wav.scp egs/cn_phn/data/text egs/cn_phn/data/out.ali ERROR (speech-aligner[5.4.215~4-f2b7]:Input():util/kaldi-io.cc:756) Error opening input stream res/tree

[ Stack-Trace: ] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const) kaldi::MessageLogger::~MessageLogger() kaldi::Input::Input(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool) main __libc_start_main _start

My setting : ubuntu16.04 cmake 3.9.1

image image

windy32 commented 4 years ago

@HW140701 对于这个问题,我在montreal-forced-aligner中看到过一个解决办法:逐渐调大beam的值,直至合适为止,可以得到textgrid文件

This works for me. Experiment as follows:

  1. merge two sample files with ffmpeg

ffmpeg -i BAC009S0002W0122.wav -i BAC009S0002W0123.wav -filter_complex '[0:0][1:0]concat=n=2:v=0:a=1[out]' -map '[out]' merged.wav

  1. create a new playlist called merged.lst with content:

merged merged.wav

  1. also create a merged transcript called merged.txt

  2. in run.sh, execute the following script

speech-aligner --config=conf/align.conf merged.lst merged.txt merged.out

(this should fail)

  1. now edit align.conf, set:

--beam=40 --retry-beam=80

(now it should work)

windy32 commented 4 years ago

I also tested another audio file of 49 seconds. In order to finish align, the beam parameter has to be increased to 10240, and it runs much slower.

I guess that's why input audio must be a play list. By design the aligner is intended to process a list of sentences, each in a separate audio file, in which case a beam of 20 or 40 should be enough.