jcsilva / docker-kaldi-android

Dockerfile for compiling Kaldi for Android.
65 stars 25 forks source link

Build from repository fails on runtime #4

Closed fwinnen closed 6 years ago

fwinnen commented 6 years ago

Hi,

I am trying to get online decoding to work on android. For the start I was following: http://kaldi-asr.org/doc/online_decoding.html and executed src/online2bin/online2-wav-nnet2-latgen-faster which was build using the approach from your readme with the jcsilva/docker-kaldi-android:latest docker pulled from Docker Hub.

src/online2bin/online2-wav-nnet2-latgen-faster --do-endpointing=false \
    --online=false \
    --config=nnet_a_gpu_online/conf/online_nnet2_decoding.conf \
    --max-active=7000 --beam=15.0 --lattice-beam=6.0 \
    --acoustic-scale=0.1 --word-symbol-table=graph/words.txt \
   nnet_a_gpu_online/final.mdl graph/HCLG.fst "ark:echo utterance-id1 utterance-id1|" "scp:echo utterance-id1 ENG_M.wav|" \
   ark:/dev/null

This all works and executes fine on android (adb shell on a P6). However, when I build your docker image from the repo, and use this to build kaldi, I get a runtime error executing the example:

LOG (orig[5.1.74~1391-c68a]:void kaldi::IvectorExtractor::ComputeDerivedVars()():ivector-extractor.cc:183) Computing derived variables for iVector extractor
WARNING (orig[5.1.74~1391-c68a]:void kaldi::TpMatrix<double>::Cholesky(const SpMatrix<Real> &) [Real = double]():tp-matrix.cc:110) Cholesky decomposition failed. Maybe matrix is not positive definite. Throwing error 
Cholesky decomposition failed

Digging into it, I found that the docker image from Docker Hub is probably not build from latest master, but from the build date (06-28) I tried commits from 06-27 and 06-28 which both fail the same way. When analyzing the build executables with readelf -a -W and looking at their diffs (build with Docker-Hub-image and self-built-image-from-this-repo-latest) There are only two differences:

These are in the repo-build executable

_ZNKSt6__ndk16vectorIPN3fst11VectorStateINS1_6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS6_EEEENS7_ISA_EEE17__annotate_deleteEv
_ZNKSt6__ndk16vectorIPN3fst11VectorStateINS1_6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS6_EEEENS7_ISA_EEE14__annotate_newEj
_ZN3fst11VectorStateINS_6ArcTplINS_16LatticeWeightTplIdEEEENSt6__ndk19allocatorIS4_EEEnwEjPNS6_IS8_EE
_ZN3fst11VectorStateINS_6ArcTplINS_16LatticeWeightTplIdEEEENSt6__ndk19allocatorIS4_EEEC2ERKS7_
_ZNSt6__ndk16vectorIPN3fst11VectorStateINS1_6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS6_EEEENS7_ISA_EEE24__RAII_IncreaseAnnotatorC2ERKSC_j
_ZNSt6__ndk16vectorIPN3fst11VectorStateINS1_6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS6_EEEENS7_ISA_EEE21__push_back_slow_pathISA_EEvOT_
_ZNKSt6__ndk16vectorIPN3fst11VectorStateINS1_6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS6_EEEENS7_ISA_EEE8max_sizeEv
_ZN3fst11VectorStateINS_6ArcTplINS_16LatticeWeightTplIdEEEENSt6__ndk19allocatorIS4_EEE11ReserveArcsEj
_ZNSt6__ndk16vectorIN3fst6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS5_EEE7reserveEj
_ZNSt6__ndk114__split_bufferIN3fst6ArcTplINS1_16LatticeWeightTplIdEEEERNS_9allocatorIS5_EEEC2EjjS8_
_ZNSt6__ndk16vectorIN3fst6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS5_EEE26__swap_out_circular_bufferERNS_14__split_bufferIS5_RS7_EE
_ZNSt6__ndk114__split_bufferIN3fst6ArcTplINS1_16LatticeWeightTplIdEEEERNS_9allocatorIS5_EEED2Ev
_ZNKSt6__ndk16vectorIN3fst6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS5_EEE17__annotate_deleteEv
_ZNKSt6__ndk16vectorIN3fst6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS5_EEE14__annotate_newEj
_ZN3fst6ArcTplINS_16LatticeWeightTplIdEEEC2ERKS3_
_ZNSt6__ndk16vectorIN3fst6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS5_EEE24__RAII_IncreaseAnnotatorC2ERKS8_j
_ZNSt6__ndk16vectorIN3fst6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS5_EEE21__push_back_slow_pathIRKS5_EEvOT_
_ZNKSt6__ndk16vectorIN3fst6ArcTplINS1_16LatticeWeightTplIdEEEENS_9allocatorIS5_EEE8max_sizeEv
_ZN3fst16ImplToMutableFstINS_8internal13VectorFstImplINS_11VectorStateINS_6ArcTplINS_16LatticeWeightTplIdEEEENSt6__ndk19allocatorIS7_EEEEEENS_10MutableFstIS7_EEEC2ENS8_10shared_ptrISC_EE
_ZN3fst9VectorFstINS_6ArcTplINS_16LatticeWeightTplIdEEEENS_11VectorStateIS4_NSt6__ndk19allocatorIS4_EEEEED2Ev
_ZNK3fst9ImplToFstINS_8internal13VectorFstImplINS_11VectorStateINS_6ArcTplINS_16LatticeWeightTplIdEEEENSt6__ndk19allocatorIS7_EEEEEENS_10MutableFstIS7_EEE5StartEv
_ZNK3fst9VectorFstINS_6ArcTpl

while the docker-hub-version only has this instead:

_ZNKSt6__ndk16vectorIPN3fst11VectorStateINS1

also these are not in the repo-build-one:

00768b30  00004a16 R_ARM_JUMP_SLOT        00000000   clock_gettime@LIBC
00768b34  00004716 R_ARM_JUMP_SLOT        00000000   getrlimit@LIBC
00768b38  00004816 R_ARM_JUMP_SLOT        00000000   raise@LIBC
00768b3c  00018516 R_ARM_JUMP_SLOT        00000000   omp_in_parallel

I don't know how this could relate to the error I am getting, but maybe it helps someone to help me :)

Note: everything was executed with kaldi source commit: 99880f7 which was suggested in your guide.

jcsilva commented 6 years ago

Hi,

there is a long time I don't touch in this. And I'm sure some things have already changed (e.g. OpenBLAS version). I can try to help you, but I'm not working with Android anymore and I don't have any devices to test it. So, I'll try update some dependencies, but I need your help for testing.

fwinnen commented 6 years ago

Hi, thanks! What would help me already is to have the commit from which you built the image on docker hub. From there I might figure out what changed and broke things.

Am 11.12.2017 um 21:17 schrieb Eduardo Silva notifications@github.com:

Hi,

there is a long time I don't touch in this. And I'm sure some things have already changed (e.g. OpenBLAS version). I can try to help you, but I'm not working with Android anymore and I don't have any devices to test it. So, I'll try update some dependencies, but I need your help for testing.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jcsilva/docker-kaldi-android/issues/4#issuecomment-350845361, or mute the thread https://github.com/notifications/unsubscribe-auth/AGAXSdKgWbNCRqe9oHwW-zKjuhBSxpaoks5s_Y30gaJpZM4Q98Gp.

jcsilva commented 6 years ago

Well,

what I've done is explained here: http://jcsilva.github.io/2017/03/18/compile-kaldi-android/

The commits I used are also pointed in this blog. But I think you have already followed these instructions, right?

fwinnen commented 6 years ago

I mean the commit from this: https://github.com/jcsilva/docker-kaldi-android https://github.com/jcsilva/docker-kaldi-android repo, which was used to build jcsilva/docker-kaldi-android on docker hub (the one I get with docker pull jcsilva/docker-kaldi-android) because thats the only one that works for me

Am 11.12.2017 um 21:25 schrieb Eduardo Silva notifications@github.com:

Well,

what I've done is explained here: http://jcsilva.github.io/2017/03/18/compile-kaldi-android/ http://jcsilva.github.io/2017/03/18/compile-kaldi-android/ The commits I used are also pointed in this blog. But I think you have already followed these instructions, right?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jcsilva/docker-kaldi-android/issues/4#issuecomment-350847355, or mute the thread https://github.com/notifications/unsubscribe-auth/AGAXSTKpfYhGT9hQAXIqi_ISUVtjKN5Hks5s_Y-ogaJpZM4Q98Gp.

jcsilva commented 6 years ago

Oh, sorry. I misunderstood.

Bad news: I don't know any more which commit fo this repository I used to build that image. But I'm almost sure it was the current master.

Good news: I checked here and saw the OpenBLAS version in my post is wrong. The correct version, which was the one used in that Docker image is sha 482015f8d6840da96. I hope it may help you.

jcsilva commented 6 years ago

@fwinnen, I think the problem was fixed with PR #5.

By the way, I've just updated the Dockerfile and de Docker image with the most recent versions of Kaldi and OpenBLAS

fwinnen commented 6 years ago

Yeah, it was the version mismatch of OpenBLAS (which I had to link again in my cmake project). Your hint with the sha helped me fix this for me. Sorry for not getting back to you! I'll close this issue, since the update you mentioned should fix the problem for future use.