idiap / phonvoc

Phonetic and phonological vocoding platform
BSD 3-Clause "New" or "Revised" License
16 stars 8 forks source link

Help for run the codec #2

Closed jiangkid closed 7 years ago

jiangkid commented 7 years ago

Hello, I have read your paper "Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding", and I'm going to follow your work. But, when I run the script"run.sh" in the ./vlbr directory, it said missing "shrc" and "profile.sh" script, and "SETSHELL" command is also not found. I cann't find these files in your repository here, can you share it? Or give me some instruction?

mcernak commented 7 years ago

Hi, The script vlbr/run.sh contains Idiap-specific environment (ll. 12-21). You have to comment them out, and point to your local Kaldi, HTK, SSP and SPTK installations. Milos

jiangkid commented 7 years ago

Thank you very much. I have setup environment for Kaldi. But, there is something error for running the script "./analysis.sh examples/recording.wav 0". It seems the nnet-forward can not read the trained final.nnet. Can you please specify the OS environment and Kaldi configure options? I'm using ubuntu14.04 64bit and the specified Kaldi v.489a1f5 .

consonantal select-feats 1 ark:- ark:recording/consonantal.ark nnet-forward train/dnns/English-SPE/consonantal-4l-dnn/final.nnet ark:- ark:- nnet-forward train/dnns/pretrain-dbn-English/final.feature_transform 'ark:copy-feats scp:recording/feats.scp ark:- | apply-cmvn --norm-vars=false --utt2spk=ark:recording/utt2spk scp:recording/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |' ark:- LOG (nnet-forward:SelectGpuId():cu-device.cc:110) Manually selected to compute on CPU. LOG (nnet-forward:SelectGpuId():cu-device.cc:110) Manually selected to compute on CPU. copy-feats scp:recording/feats.scp ark:- apply-cmvn --norm-vars=false --utt2spk=ark:recording/utt2spk scp:recording/cmvn.scp ark:- ark:- add-deltas --delta-order=2 ark:- ark:- LOG (copy-feats:main():copy-feats.cc:101) Copied 1 feature matrices. LOG (apply-cmvn:main():apply-cmvn.cc:146) Applied cepstral mean normalization to 1 utterances, errors on 0 ERROR (nnet-forward:Read():kaldi-matrix.cc:1432) Failed to read matrix from stream. File position at start is 10158568, currently -1 ERROR (nnet-forward:Read():kaldi-matrix.cc:1432) Failed to read matrix from stream. File position at start is 10158568, currently -1

[stack trace: ] kaldi::KaldiGetStackTrace() kaldi::KaldiErrorMessage::~KaldiErrorMessage() kaldi::Matrix::Read(std::istream&, bool, bool) kaldi::CuMatrix::Read(std::istream&, bool) kaldi::nnet1::AffineTransform::ReadData(std::istream&, bool) kaldi::nnet1::Component::Read(std::istream&, bool) kaldi::nnet1::Nnet::Read(std::istream&, bool) kaldi::nnet1::Nnet::Read(std::string const&) nnet-forward(main+0x5a1) [0x4c60fe] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f64d90d6f45] nnet-forward() [0x4c5a99]

WARNING (select-feats:main():select-feats.cc:59) Empty archive provided.

$ nnet-forward train/dnns/English-SPE/consonantal-4l-dnn/final.nnet ark:- ark:-

nnet-forward train/dnns/English-SPE/consonantal-4l-dnn/final.nnet ark:- ark:- LOG (nnet-forward:SelectGpuId():cu-device.cc:110) Manually selected to compute on CPU. ERROR (nnet-forward:Read():kaldi-matrix.cc:1432) Failed to read matrix from stream. File position at start is 10158568, currently -1 ERROR (nnet-forward:Read():kaldi-matrix.cc:1432) Failed to read matrix from stream. File position at start is 10158568, currently -1

[stack trace: ] kaldi::KaldiGetStackTrace() kaldi::KaldiErrorMessage::~KaldiErrorMessage() kaldi::Matrix::Read(std::istream&, bool, bool) kaldi::CuMatrix::Read(std::istream&, bool) kaldi::nnet1::AffineTransform::ReadData(std::istream&, bool) kaldi::nnet1::Component::Read(std::istream&, bool) kaldi::nnet1::Nnet::Read(std::istream&, bool) kaldi::nnet1::Nnet::Read(std::string const&) nnet-forward(main+0x5a1) [0x4c60fe] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fc2eb955f45] nnet-forward() [0x4c5a99]

mcernak commented 7 years ago

Hello, It seems you call analysis.sh from the phonvoc root directory. To run the codec, change directory to vlbr/ and call $ run.sh ../examples/recording.wav recording.nn.wav Don't forget to compile additional binaries first, as described in vlbr/README.txt.

jiangkid commented 7 years ago

Hello, I have installed Kaldi, HTK, SSP and SPTK, the version is SPTK-3.9, HTK-3.4.1. But, when I call the following command in the vlbr directory $ run.sh ../examples/recording.wav recording.nn.wav The script failed in HCopy -C ../conf/PLP_0.cfg $inFile $id/$id.htk ../conf/PLP_0.cfg is missing. I search in your github, and use this file: https://github.com/idiap/iss/blob/b6019214c425a94ab4665e0bf076d678e5ad8c63/lib/config/PLP_0.cfg But, it still failed in bin/nsylb -i $id/$id.htk > $id/$id.nsylb $id/$id.nsylb say: nsylb: error with sample size - expected 13, please check Could you share the file ../conf/PLP_0.cfg, and specify the HTK version? Thank you.

mcernak commented 7 years ago

Hello,

I committed the correct version of the config file. The only difference is in a specification of byte order as the machine’s natural byte order. Without it, the HTK files have big endian order, and your read them probably on a litte endian machine.

Best, Milos

On 16.11.2016 08:39, jiangkid wrote:

Hello, I have installed Kaldi, HTK, SSP and SPTK, the version is SPTK-3.9, HTK-3.4.1. But, when I call the following command in the vlbr directory $ run.sh ../examples/recording.wav recording.nn.wav The script failed in |HCopy -C ../conf/PLP0.cfg $inFile $id/$id.htk| ../conf/PLP0.cfg is missing. I search in your github, and use this file: https://github.com/idiap/iss/blob/b6019214c425a94ab4665e0bf076d678e5ad8c63/lib/config/PLP_0.cfg But, it still failed in |bin/nsylb -i $id/$id.htk > $id/$id.nsylb| $id/$id.nsylb say: nsylb: error with sample size - expected 13, please check Could you share the file _../conf/PLP0.cfg, and specify the HTK version? Thank you.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/idiap/phonvoc/issues/2#issuecomment-260875476, or mute the thread https://github.com/notifications/unsubscribe-auth/AEcac3Mt4Fk7mELi_bS602qV0TXBVItjks5q-rMjgaJpZM4KX9ml.

jiangkid commented 7 years ago

Thank you very much! But, it still can not work. The nsylb crashed:

* Error in `./bin/nsylb': double free or corruption (!prev): 0x0000000000c51830 * Aborted

It seems crashed at free(plpdata); And I'm trying to debug syllabledecoder_PLP.c

jiangkid commented 7 years ago

I find a bug in syllabledecoder_PLP.c,

plpdata = (float *) malloc(sizeof(float)*nchan*nSamples); // PLP features
chan = (double *) malloc(sizeof(double)*nchan*nSamples);

The maolloc should be after nSamples set correctly. Unfortunately, the codec still cannot run. I will try again tomorrow...

mcernak commented 7 years ago

Thank you, that was indeed a bug. I fixed and committed it.

Let me please know if you can run the codec now.

Best, Milos

On 16.11.2016 18:25, jiangkid wrote:

I find a bug in syllabledecoder_PLP.c,

|plpdata = (float _) malloc(sizeof(float)_nchan_nSamples); // PLP features chan = (double ) malloc(sizeof(double)_nchannSamples); |

The maolloc should be after nSamples set correctly. Unfortunately, the codec still cannot run. I will try again tomorrow...

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/idiap/phonvoc/issues/2#issuecomment-261012052, or mute the thread https://github.com/notifications/unsubscribe-auth/AEcac3ZNjdpCHcLyEiHy00fpooj-CMhgks5q-zyOgaJpZM4KX9ml.

jiangkid commented 7 years ago

Hello, I can run the codec now, with your example wav file "recording.wav". But, when I encode some wav file from TIMIT, it worked for some speech files, but failed for others. I attached the wav file and logfile in a package here. (Attach files here cannot support rar file, so I added .pdf. Please rename the attached file and unpackage it) vlbr.rar.pdf I think there must be some bugs, and I'm trying to find it...

Another question is, dose the codec only support 16kHz sampling rate? How about 8kHz?

Thanks.

jiangkid commented 7 years ago

It seems too many pitch are added to $id/$id.f0 in the LPC re-synthesis procedure. I modified the var f0Diff , then, it works.

if [[ $f0Diff -le 0 ]]; then
    echo "orig pitch align: removing $f0Diff f0 frames"
    cat $id/$id.qdec.lf0 | tail -n +4 | head -n $hnrNum > $id/$id.f0
else
    echo "orig pitch align: adding $f0Diff f0 frames"
    cp $id/$id.qdec.lf0 $id/$id.f0
    lastPitch=`cat $id/$id.qdec.lf0 | tail -n 1`
    ((f0Diff = f0Diff - 4)) #modified here
    for d ({1..$f0Diff}); do
        echo $lastPitch >> $id/$id.f0
    done 
fi
mcernak commented 7 years ago

Thank you, I pushed your fix. This prototype supports only 16kHz sampling rate. Analysis DNNs have to be re-trained for 8kHz front-end. But synthesis DNN can either re-trained or the same, 16kHz. This could be a bandwidth-extension approach..

There is also one point, due to license issues, this prototype works with DNNs trained on LibriSpeech data. The journal paper presented results on DNNs trained on WSJ, which cannot be shared publicly.

Milos

On 17.11.2016 09:33, jiangkid wrote:

It seems too many pitch are added to $id/$id.f0 I modified the var f0Diff , then, it works.

|if [[ $f0Diff -le 0 ]]; then echo "orig pitch align: removing $f0Diff f0 frames" cat $id/$id.qdec.lf0 | tail -n +4 | head -n $hnrNum > $id/$id.f0 else echo "orig pitch align: adding $f0Diff f0 frames" cp $id/$id.qdec.lf0 $id/$id.f0 lastPitch=cat $id/$id.qdec.lf0 | tail -n 1 ((f0Diff = f0Diff - 4)) #modified here for d ({1..$f0Diff}); do echo $lastPitch >> $id/$id.f0 done fi |

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/idiap/phonvoc/issues/2#issuecomment-261186082, or mute the thread https://github.com/notifications/unsubscribe-auth/AEcac1RORmHw5nQaNa5ht0ejfx8glO4jks5q_BE9gaJpZM4KX9ml.

jiangkid commented 7 years ago

Thank you for helping me a lot. I will try to re-train the DNNs .

jiangkid commented 7 years ago

When I run the whole TIMIT corpus for test, there are still bugs for pitch align. I modified the script as following.

echo "-------- Phonological synthesis (LPC re-synthesis) --"
../train/toHTK.py $id/$id.lsf $id/$id.htk $id/$id.hnr $id/$id.synth.f0
hnrNum=`cat $id/$id.hnr | wc -l`
f0Num=`cat $id/$id.qdec.lf0 | wc -l`
(( f0Diff = f0Num - hnrNum))
echo "lf0:$f0Num; hnr:$hnrNum; Diff:$f0Diff"
if [[ $f0Diff -ge 4 ]]; then
    echo "orig pitch align: removing $f0Diff f0 frames"
    cat $id/$id.qdec.lf0 | tail -n +4 | head -n $hnrNum > $id/$id.f0
elif [[ $f0Diff -ge 0 ]]; then
    echo "else: orig pitch align: removing $f0Diff f0 frames"
    cat $id/$id.qdec.lf0 | head -n $hnrNum > $id/$id.f0
else
    echo "orig pitch align: adding $f0Diff f0 frames"
    cp $id/$id.qdec.lf0 $id/$id.f0
    lastPitch=`cat $id/$id.qdec.lf0 | tail -n 1`
    for ((i=$f0Diff; i<0; i=i+1));do
        echo $lastPitch >> $id/$id.f0
    done 
fi
mcernak commented 7 years ago

Thank you, I pushed your changes.