batched-wav-nnet3-cuda core dump when set --gpu-feature-extract=true

drawfish commented 5 years ago

Below is the exec info:

batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-
chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000 --mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf --word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/gr
cc/wav.scp ark:/dev/null-kndiscount-5.0e-10/words.txt exp/nnet3_tdnn_online_game_transfer_final/tdnn-lstm/final.mdl exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst scp:data/dahua/mfc
batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000
--mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf '--word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/graph/words.txt' exp/nnet3_tdnn_onl
ine_game_transfer_final/tdnn-lstm/final.mdl 'exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst' scp:data/dahua/mfcc/wav.scp ark:/dev/null
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode.
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [6]: GeForce GTX 1080 Ti   free:10945M, used:227M, total:11172M, free/total:0.979682 version 6.1
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 5 orphan nodes.
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 11 orphan components.
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:Collapse():nnet-utils.cc:1463) Added 6 components, removed 11
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 2 control threads, 20 worker threads and batch size 200
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260
LOG (batched-wav-nnet3-cuda[5.5.402~1-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260
ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.402~1-53346]:ComputeFeatures():online-cuda-feature-pipeline.cc:66) Assertion failed: (false)

[ Stack-Trace: ]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7f3271b6b31a]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x6c) [0x7f3271b6bd88]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)+0x
ae) [0x7f326fdc633a]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPip
eline::TaskState*, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*> >&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7f32745b5529]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x4fe) [0x7f32745b60fa]
/storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder:
:BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()+0x2a) [0x7f32745b7f28]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f3270a0fc80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f32711566ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f327047e41d]

Aborted (core dumped)

gdb core info:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feat'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f32703ac428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f30e106c700 (LWP 41345))]
(gdb) bt
#0  0x00007f32703ac428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f32703ae02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f3271b6bda1 in kaldi::KaldiAssertFailure_ (
    func=func@entry=0x7f326fdca460 <kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)::__func__> "ComputeFeatures",
    file=file@entry=0x7f326fdc9c00 "online-cuda-feature-pipeline.cc", line=line@entry=66, cond_str=cond_str@entry=0x7f326fdc8d77 "false") at kaldi-error.cc:234
#3  0x00007f326fdc633a in kaldi::OnlineCudaFeaturePipeline::ComputeFeatures (this=this@entry=0x7f30e106b330, cu_wave=..., sample_freq=8000, input_features=0x7f323fcfcaa0, ivector_features=0x7f323fcfca90)
    at online-cuda-feature-pipeline.cc:66
#4  0x00007f32745b5529 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures (this=this@entry=0x7ffe36a937b0, first=first@entry=0,
    tasks=std::vector of length 1, capacity 200 = {...}, feature_pipeline=...) at batched-threaded-nnet3-cuda-pipeline.cc:616
#5  0x00007f32745b60fa in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker (this=0x7ffe36a937b0, threadId=<optimized out>) at batched-threaded-nnet3-cuda-pipeline.cc:841
#6  0x00007f32745b7f28 in std::_Mem_fn_base<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), true>::operator()<int, void>(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int&&) const (__object=<optimized out>, this=<optimized out>) at /usr/include/c++/5/functional:600
#7  std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)>::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=<optimized out>) at /usr/include/c++/5/functional:1531
#8  std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)>::operator()() (this=<optimized out>)
    at /usr/include/c++/5/functional:1520
#9  std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run() (
    this=<optimized out>) at /usr/include/c++/5/thread:115
#10 0x00007f3270a0fc80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f32711566ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f327047e41d in clone () from /lib/x86_64-linux-gnu/libc.so.6

danpovey commented 5 years ago

It looks to me like it's only set up right now to work when you have ivectors. It may have to be extended to work when there are no ivectors

On Thu, Jun 27, 2019 at 5:45 AM drawnfish notifications@github.com wrote:

Below is the exec info: batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000 --mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf --word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/grcc/wav.scp ark:/dev/null-kndiscount-5.0e-10/words.txt exp/nnet3_tdnn/tdnn-lstm/final.mdl exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst scp:data/dahua/mfc batched-wav-nnet3-cuda --acoustic-scale=1.0 --add-pitch=true --beam=15.0 --feature-type=mfcc --frame-subsampling-factor=3 --frames-per-chunk=150 --gpu-feature-extract=true --lattice-beam=8.0 --max-active=7000 --mfcc-config=config/mfcc.conf --online-pitch-config=config/pitch.conf '--word-symbol-table=exp/nnet3_tdnn_online_game_transfer_final/graph/words.txt' exp/nnet3_tdnn_online_game_transfer_final/tdnn-lstm/final.mdl 'exp/nnet3_tdnn_online_game_transfer_final/graph/HCLG.fst' scp:data/mfcc/wav.scp ark:/dev/null LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode. LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [6]: GeForce GTX 1080 Ti free:10945M, used:227M, total:11172M, free/total:0.979682 version 6.1 LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 5 orphan nodes. LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 11 orphan components. LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:Collapse():nnet-utils.cc:1463) Added 6 components, removed 11 LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 2 control threads, 20 worker threads and batch size 200 LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260 LOG (batched-wav-nnet3-cuda[5.5.4021-53346]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:780) CudaDecoder batch_size=200 num_channels=260 ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.402~1-53346]:ComputeFeatures():online-cuda-feature-pipeline.cc:66) Assertion failed: (false)

[ Stack-Trace: ] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7f3271b6b31a] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const, char const, int, char const)+0x6c) [0x7f3271b6bd88] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase const&, float, kaldi::CuMatrix, kaldi::CuVector)+0xae) [0x7f326fdc633a] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState, std::allocatorkaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*

&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7f32745b5529] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x4fe) [0x7f32745b60fa] /storage01/gzchenduisheng/dev/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int)> ::_M_run()+0x2a) [0x7f32745b7f28] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f3270a0fc80] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f32711566ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f327047e41d]

Aborted (core dumped)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3425?email_source=notifications&email_token=AAZFLO3APIURAC4LU4OPXDLP4SD47A5CNFSM4H32FQT2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G4A27WQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO7MAJADMHKEZHDCPXDP4SD47ANCNFSM4H32FQTQ .

cloudhan commented 5 years ago

I encountered another problem:

ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.413~1-c4490]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1)

And the GDB stacktrace:

(gdb) 
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fffd5044231 in __GI_abort () at abort.c:79
#2  0x0000555555a9f2f4 in kaldi::KaldiAssertFailure_ (
    func=func@entry=0x555555ace7a0 <kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)::__func__> "AddMatMat", file=file@entry=0x555555acc143 "cu-matrix.cc", line=line@entry=1305, 
    cond_str=cond_str@entry=0x555555acce1d "k == k1") at kaldi-error.cc:234
#3  0x000055555589ee97 in kaldi::CuMatrixBase<float>::AddMatMat (this=this@entry=0x7fff1dff2930, alpha=alpha@entry=1, A=..., 
    transA=transA@entry=kaldi::kNoTrans, B=..., transB=transB@entry=kaldi::kTrans, beta=beta@entry=0) at cu-matrix.cc:1305
#4  0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)
    at online-ivector-feature-cuda.cc:58
#5  0x0000555555739684 in kaldi::OnlineCudaFeaturePipeline::ComputeFeatures (this=this@entry=0x7fff1dff2fe0, cu_wave=..., sample_freq=16000, 
    input_features=0x555574ff2ef0, ivector_features=0x555574ff2ee0) at online-cuda-feature-pipeline.cc:64
#6  0x0000555555715152 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures (this=this@entry=0x7fffffffd760, first=first@entry=0, 
    tasks=std::vector of length 1, capacity 50 = {...}, feature_pipeline=...) at batched-threaded-nnet3-cuda-pipeline.cc:615
#7  0x0000555555715e2f in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker (this=0x7fffffffd760, threadId=<optimized out>)
    at batched-threaded-nnet3-cuda-pipeline.cc:841
#8  0x000055555571771b in std::__invoke_impl<void, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int>(std::__invoke_memfun_deref, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*&&, int&&) (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/7/bits/invoke.h:73
#9  std::__invoke<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int>(void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*&&, int&&) (__fn=<optimized out>)
    at /usr/include/c++/7/bits/invoke.h:95
#10 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/7/thread:234
#11 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> >::operator() (this=<optimized out>) at /usr/include/c++/7/thread:243
#12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int> > >::_M_run (this=<optimized out>) at /usr/include/c++/7/thread:186
#13 0x00007fffd569b96f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007ffff0e7e5aa in start_thread (arg=0x7fff1dffb700) at pthread_create.c:463
#15 0x00007fffd5104cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

And additional debug info:

(gdb) frame 4
#4  0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)
    at online-ivector-feature-cuda.cc:58
58      lda_feats_normalized.AddMatMat(1.0, spliced_feats_normalized, kNoTrans,
(gdb) print spliced_feats_normalized
$1 = {<kaldi::CuMatrixBase<float>> = {data_ = 0x7ffcaa000000, num_cols_ = 280, num_rows_ = 1024, stride_ = 320}, <No data fields>}
(gdb) print cu_lda_
$2 = {<kaldi::CuMatrixBase<float>> = {data_ = 0x7fff5fbd8e00, num_cols_ = 281, num_rows_ = 40, stride_ = 320}, <No data fields>}

It runs fine when you disable the cuda feature pipeline.

@luitjens Would you mind look at it?

luitjens commented 5 years ago

Please provide your feature extraction configuration. These types of errors are going to be expected until we fill in all the features.

On Mon, Jul 1, 2019 at 4:11 AM Cloud Han notifications@github.com wrote:

I encountered another problem:

ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.413~1-c4490]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1)

And the GDB stacktrace:

(gdb)

0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51

1 0x00007fffd5044231 in __GI_abort () at abort.c:79

2 0x0000555555a9f2f4 in kaldi::KaldiAssertFailure_ (
func=func@entry=0x555555ace7a0 <kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)::__func__> "AddMatMat", file=file@entry=0x555555acc143 "cu-matrix.cc", line=line@entry=1305,
cond_str=cond_str@entry=0x555555acce1d "k == k1") at kaldi-error.cc:234
3 0x000055555589ee97 in kaldi::CuMatrixBase::AddMatMat (this=this@entry=0x7fff1dff2930, alpha=alpha@entry=1, A=...,
transA=transA@entry=kaldi::kNoTrans, B=..., transB=transB@entry=kaldi::kTrans, beta=beta@entry=0) at cu-matrix.cc:1305
4 0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)
at online-ivector-feature-cuda.cc:58
5 0x0000555555739684 in kaldi::OnlineCudaFeaturePipeline::ComputeFeatures (this=this@entry=0x7fff1dff2fe0, cu_wave=..., sample_freq=16000,
input_features=0x555574ff2ef0, ivector_features=0x555574ff2ee0) at online-cuda-feature-pipeline.cc:64
6 0x0000555555715152 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures (this=this@entry=0x7fffffffd760, first=first@entry=0,
tasks=std::vector of length 1, capacity 50 = {...}, feature_pipeline=...) at batched-threaded-nnet3-cuda-pipeline.cc:615
7 0x0000555555715e2f in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker (this=0x7fffffffd760, threadId=)
at batched-threaded-nnet3-cuda-pipeline.cc:841
8 0x000055555571771b in std::__invoke_impl<void, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int>(std::invoke_memfun_deref, void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline&&, int&&) (t=, __f=) at /usr/include/c++/7/bits/invoke.h:73

9 std::invoke<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int>(void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline&&, int&&) (fn=)
at /usr/include/c++/7/bits/invoke.h:95
10 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int> >::_M_invoke<0ul, 1ul, 2ul> (this=) at /usr/include/c++/7/thread:234

11 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int> >::operator() (this=) at /usr/include/c++/7/thread:243

12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int> > >::_M_run (this=) at /usr/include/c++/7/thread:186

13 0x00007fffd569b96f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

14 0x00007ffff0e7e5aa in start_thread (arg=0x7fff1dffb700) at pthread_create.c:463

15 0x00007fffd5104cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

And additional debug info:

(gdb) frame 4

4 0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)
at online-ivector-feature-cuda.cc:58
58 lda_feats_normalized.AddMatMat(1.0, spliced_feats_normalized, kNoTrans, (gdb) print spliced_featsnormalized $1 = {<kaldi::CuMatrixBase> = {data = 0x7ffcaa000000, numcols = 280, numrows = 1024, stride_ = 320}, } (gdb) print culda $2 = {<kaldi::CuMatrixBase> = {data_ = 0x7fff5fbd8e00, numcols = 281, numrows = 40, stride_ = 320}, }

It runs fine when you disable the cuda feature pipeline.

@luitjens https://github.com/luitjens Would you mind look at it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3425?email_source=notifications&email_token=ABSFS4UU66FKOD7ALS6XVMDP5HJ6DA5CNFSM4H32FQT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY5VA4I#issuecomment-507203697, or mute the thread https://github.com/notifications/unsubscribe-auth/ABSFS4TPEJEDHXYEXLRWHPLP5HJ6DANCNFSM4H32FQTQ .

luitjens commented 5 years ago

I think Dan's assertion about ivectors is probably right. We tried placing the hooks in to not have ivectors but my guess is we are not resizing the matrix the right size. I'm heading on vacation in a day and can look at it when I return in 2 weeks. In the meantime i'd suggest turning gpu feature extraction off in the binary for this model. Or if you are really ambitious trying to fix it while i'm gone. Maybe @LeviBarnes will have time to look into this.

dpny518 commented 5 years ago

I had the same error

batched-wav-nnet3-cuda --cuda-use-tensor-cores=true --iterations=5 --cuda-memory-proportion=.5 --max-batch-size=32 --cuda-control-threads=3 --batch-drain-size=8 --cuda-worker-threads=2 --cuda-use-tensor-cores=false --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 --config=model/conf/online.conf --word-symbol-table=model/graph/words.txt --max-batch-size=1 --cuda-worker-threads=2 model/final.mdl model/graph/HCLG.fst scp:results/temp/wav.scp ark:/dev/null 
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [0]: GeForce GTX 1080 Ti   free:10453M, used:723M, total:11177M, free/total:0.935236 version 6.1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 25 orphan nodes.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 50 orphan components.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Collapse():nnet-utils.cc:1463) Added 25 components, removed 50
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 3 control threads, 2 worker threads and batch size 1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.434~1-e167b]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1)

[ Stack-Trace: ]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7fc0c6a1330a]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x6c) [0x7fc0c6a13d78]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudamatrix.so(kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)+0xf7) [0x7fc0c7c7a62d]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::IvectorExtractorFastCuda::GetIvector(kaldi::CuMatrixBase<float> const&, kaldi::CuVector<float>*)+0x1b2) [0x7fc0c4c6abc4]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase<float> const&, float, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*)+0x8d) [0x7fc0c4c6c339]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*> >&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7fc0c9465303]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x503) [0x7fc0c94662a1]
/media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()+0x2a) [0x7fc0c9467e7c]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc0c58b7c80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc0c5ffe6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc0c532641d]

Here is my script that I use to decode it

cuda_flags="--cuda-use-tensor-cores=true 
--iterations=5 --cuda-memory-proportion=.5 
--max-batch-size=32 --cuda-control-threads=3 
--batch-drain-size=8 --cuda-worker-threads=2"

batched-wav-nnet3-cuda $cuda_flags  \
      --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 \
      --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 \
      --config="$model"/conf/online.conf \
      --word-symbol-table="$model"/graph/words.txt \
      --max-batch-size=1 \
      --cuda-worker-threads=2 \
      "$model"/final.mdl \
      "$model"/graph/HCLG.fst \
      "scp:$results/temp/wav.scp" \
      "ark:/dev/null" 2>&1  | tee -a result.txt

luitjens commented 5 years ago

Please provide the content of all feature extraction config files.

On Fri, Aug 2, 2019 at 2:17 AM Steve Rogers notifications@github.com wrote:

I had the same error

batched-wav-nnet3-cuda --cuda-use-tensor-cores=true --iterations=5 --cuda-memory-proportion=.5 --max-batch-size=32 --cuda-control-threads=3 --batch-drain-size=8 --cuda-worker-threads=2 --cuda-use-tensor-cores=false --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 --config=model/conf/online.conf --word-symbol-table=model/graph/words.txt --max-batch-size=1 --cuda-worker-threads=2 model/final.mdl model/graph/HCLG.fst scp:results/temp/wav.scp ark:/dev/null LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [0]: GeForce GTX 1080 Ti free:10453M, used:723M, total:11177M, free/total:0.935236 version 6.1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 25 orphan nodes. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 50 orphan components. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Collapse():nnet-utils.cc:1463) Added 25 components, removed 50 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 3 control threads, 2 worker threads and batch size 1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ExecuteWorker():batched-threaded-nnet3-cuda-pipeline.cc:784) CudaDecoder batch_size=1 num_channels=1 LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (batched-wav-nnet3-cuda[5.5.434~1-e167b]:ComputeDerivedVars():ivector-extractor.cc:204) Done. ASSERTION_FAILED (batched-wav-nnet3-cuda[5.5.434~1-e167b]:AddMatMat():cu-matrix.cc:1305) Assertion failed: (k == k1)

[ Stack-Trace: ] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7fc0c6a1330a] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const, char const, int, char const)+0x6c) [0x7fc0c6a13d78] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudamatrix.so(kaldi::CuMatrixBase::AddMatMat(float, kaldi::CuMatrixBase const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase const&, kaldi::MatrixTransposeType, float)+0xf7) [0x7fc0c7c7a62d] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::IvectorExtractorFastCuda::GetIvector(kaldi::CuMatrixBase const&, kaldi::CuVector)+0x1b2) [0x7fc0c4c6abc4] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineCudaFeaturePipeline::ComputeFeatures(kaldi::CuVectorBase const&, float, kaldi::CuMatrix, kaldi::CuVector)+0x8d) [0x7fc0c4c6c339] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState> >&, kaldi::OnlineCudaFeaturePipeline&)+0xc23) [0x7fc0c9465303] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)+0x503) [0x7fc0c94662a1] /media/yondutsai/415aae4a-1419-40da-8de5-0add5a619d06/kaldi/src/lib/libkaldi-cudadecoder.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int)> >::_M_run()+0x2a) [0x7fc0c9467e7c] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc0c58b7c80] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc0c5ffe6ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc0c532641d]

Here is my script that I use to decode it

cuda_flags="--cuda-use-tensor-cores=true --iterations=5 --cuda-memory-proportion=.5 --max-batch-size=32 --cuda-control-threads=3 --batch-drain-size=8 --cuda-worker-threads=2"

batched-wav-nnet3-cuda $cuda_flags \ --word-symbol-table=exp/tdnn/graph/words.txt --frame-subsampling-factor=3 --frames-per-chunk=51 \ --acoustic-scale=.8 --beam=12.0 --lattice-beam=4 --max-active=10000 \ --config="$model"/conf/online.conf \ --word-symbol-table="$model"/graph/words.txt \ --max-batch-size=1 \ --cuda-worker-threads=2 \ "$model"/final.mdl \ "$model"/graph/HCLG.fst \ "scp:$results/temp/wav.scp" \ "ark:/dev/null" 2>&1 | tee -a result.txt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3425?email_source=notifications&email_token=ABSFS4SJBJSC4VM5GBOSQHDQCPUTLA5CNFSM4H32FQT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3NAYRA#issuecomment-517606468, or mute the thread https://github.com/notifications/unsubscribe-auth/ABSFS4VLL6PR6ZA7PDV6PQTQCPUTLANCNFSM4H32FQTQ .

dpny518 commented 5 years ago

ivector

--splice-config=model/conf/splice.conf
--cmvn-config=model/ivector/online_cmvn.conf
--lda-matrix=model/ivector/final.mat
--global-cmvn-stats=model/ivector/global_cmvn.stats
--diag-ubm=model/ivector/final.dubm
--ivector-extractor=model/ivector/final.ie
--num-gselect=5
--min-post=0.025
--posterior-scale=0.1
--max-remembered-frames=1000
--max-count=100

online.conf

--feature-type=mfcc
--mfcc-config=model/conf/mfcc.conf
--ivector-extraction-config=model/conf/ivector_extractor.conf
--frame-subsampling-factor=3
--add-pitch=true
--acoustic-scale=1
--minimize=false
--max-active=10000
--beam=15
--lattice-beam=8

mfcc

# config for high-resolution MFCC features, intended for neural network training.
# Note: we keep all cepstra, so it has the same info as filterbank features,
# but MFCC is more easily compressible (because less correlated) which is why
# we prefer this method.
--use-energy=false       # use average of log energy, not energy.
--sample-frequency=16000 # AISHELL-2 is sampled at 16kHz
--num-mel-bins=40        # similar to Google's setup.
--num-ceps=40            # there is no dimensionality reduction.
--low-freq=20            # low cutoff frequency for mel bins
--high-freq=-400         # high cutoff frequency, relative to Nyquist of 8000 (=7600)

Also my WER is a lot worse with GPU than CPU, 88% WER vs 9.95 WER when i set -gpu-feature-extract=false

cloudhan commented 5 years ago

the problem is due to --add-pitch=true in online.conf, the pitch computing and processing has not been implemented with cuda code, clearly. And due to the complexity with pitch compute, I don't think it can be convert easily. The easiest way to walkaround this issue might be writing a data adapter for OnlineFeatureInferface and use CPU code for pitch extraction in short term.

Note, the complexity is not due to the algorithm behind it, but the online feature processing. The historic version pitch-functions.cc, which has only offline processing functionality, is pretty clear. The only problem is in latter online refactoring, from commit 16fb11474e489d04e4f9c9962982c878b59ee9ba to c34e68d26c319d7b46596052d0a9db5057550cf8 cause a pitch feature difference and I cannot backport it.

luitjens commented 5 years ago

Thanks for the response Cloud. Indeed pitch is not implemented yet. We have not evaluated how hard it would be to do so and just have not gotten around to doing this. We have a very small team and a lot of requests. The order we are implementing things are completely customer driven and we don't currently have any models which use pitch. Once we get a high priority model in house that uses pitch we will attempt to implement this.

The work around is to set gpu-feature-extract=false. Performance for a single GPU will be about the same however scalablity on dense GPU systems will suffer.

On Mon, Aug 5, 2019 at 12:36 AM Cloud Han notifications@github.com wrote:

the problem is due to --add-pitch=true in online.conf, the pitch computing and processing has not been implemented with cuda code, clearly. And due to the complexity with pitch compute, I don't think it can be convert easily. The easiest way to walkaround this issue might be writing a data adapter for OnlineFeatureInferface and use CPU code for pitch extraction in short term.

Note, the complexity is not due to the algorithm behind it, but the online feature processing. The historic version pitch-functions.cc https://github.com/kaldi-asr/kaldi/blob/4d656e1df34579fdf32645da5c0dabaf9d74e2ce/src/feat/pitch-functions.cc, which has only offline processing functionality, is pretty clear. The only problem is in latter online refactoring, from commit 16fb114 https://github.com/kaldi-asr/kaldi/commit/16fb11474e489d04e4f9c9962982c878b59ee9ba to c34e68d https://github.com/kaldi-asr/kaldi/commit/c34e68d26c319d7b46596052d0a9db5057550cf8 cause a pitch feature difference and I cannot backport it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3425?email_source=notifications&email_token=ABSFS4VXAW4E3BK6HKDG5ZDQC7C77A5CNFSM4H32FQT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3Q2QFI#issuecomment-518105109, or mute the thread https://github.com/notifications/unsubscribe-auth/ABSFS4V7SJQNYWF2ZFQV3ODQC7C77ANCNFSM4H32FQTQ .

luitjens commented 5 years ago

Can you please try with the latest master? We just pushed in the FBANK code which also includes a fix for no ivectors.

danpovey commented 4 years ago

Closing as likely has already been fixed.

kaldi-asr / kaldi

batched-wav-nnet3-cuda core dump when set --gpu-feature-extract=true #3425

0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51

1 0x00007fffd5044231 in __GI_abort () at abort.c:79

2 0x0000555555a9f2f4 in kaldi::KaldiAssertFailure_ (

3 0x000055555589ee97 in kaldi::CuMatrixBase::AddMatMat (this=this@entry=0x7fff1dff2930, alpha=alpha@entry=1, A=...,

4 0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)

5 0x0000555555739684 in kaldi::OnlineCudaFeaturePipeline::ComputeFeatures (this=this@entry=0x7fff1dff2fe0, cu_wave=..., sample_freq=16000,

6 0x0000555555715152 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures (this=this@entry=0x7fffffffd760, first=first@entry=0,

7 0x0000555555715e2f in kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker (this=0x7fffffffd760, threadId=)

9 std::invoke<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int>(void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::&&)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline&&, int&&) (fn=)

10 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int> >::_M_invoke<0ul, 1ul, 2ul> (this=) at /usr/include/c++/7/thread:234

11 std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int> >::operator() (this=) at /usr/include/c++/7/thread:243

12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::)(int), kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline, int> > >::_M_run (this=) at /usr/include/c++/7/thread:186

13 0x00007fffd569b96f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

14 0x00007ffff0e7e5aa in start_thread (arg=0x7fff1dffb700) at pthread_create.c:463

15 0x00007fffd5104cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

4 0x000055555573f293 in kaldi::IvectorExtractorFastCuda::GetIvector (this=0x7ffd159b0020, feats=..., ivector=ivector@entry=0x555574ff2ee0)