Open wclr opened 5 years ago
I got the same error. @whitecolor Did you find the solution?
I got the same error.
The broken pipe error comes from the Kaldi process dying before returning the result.
In my case, the container was built on a newer machine and Kaldi/OpenBLAS were thus built to use newer CPU features (AVX512 specifically). When run on an older machine k3 would die due to SIGILL (illegal instruction). Docker does not address this sort of problem, the application needs to be built for whatever system (or family of systems) it is going to run on. In my case the oldest machine we needed to work was an Ivy Bridge generation processor, so adding TARGET=SANDYBRIDGE
to the openblas_compile
target in ext/kaldi/tools/Makefile
solved the problem (see OpenBLAS/Targets.txt for accepted platforms). That's not to say an illegal instruction is always the problem, just my particular case.
I found it very helpful to run a script, e.g. align.py, inside the container under strace to see what killed k3:
$ apt update
$ apt install -y strace
$ strace -f -o trace.log python3 ...
Open up trace.log and search for "get-final" and usually you'll see something like this:
360 +++ killed by SIGILL (core dumped) +++
361 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=360, si_uid=0, si_status=SIGILL, si_utime=0, si_stime=2} ---
325 <... futex resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
325 futex(0x120a1c0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff <unfinished ...>
361 write(5, "get-final\n", 10) = -1 EPIPE (Broken pipe)
361 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=325, si_uid=0} ---
In this case process 361 is Python waiting for data from k3, and it receives SIGPIPE due to the broken pipe. A little higher up we see Python also got SIGCHLD indicating a child process (k3 in this case, pid 360) died due to SIGILL, and just before that we see process 360 was killed by SIGILL. With a little bit of investigation you might be able to pin down what's causing k3 to crash.
Python waiting for data from k3, and it receives SIGPIPE due to the broken pipe. A little higher up we see Python also got SIGCHLD indicating a child process (k3 in this case, pid 360) died due to SIGILL, and just before that we see process 360 was killed by SIGILL. With a little bit of investigation you might be able to pin down what's causing k3 to crash.
Nice investigation! How to fix this issue?
Nice investigation! How to fix this issue?
In my case the solution was to make sure OpenBLAS is compiled for the oldest processor the container will run on:
... the oldest machine we needed to work was an Ivy Bridge generation processor, so adding
TARGET=SANDYBRIDGE
to theopenblas_compile
target inext/kaldi/tools/Makefile
solved the problem
But that might not be the solution for everyone. k3 could crash for different reasons, so I outlined the longer debugging process to help people figure out what was going wrong on their machines.
I've managed to build the latest source docker image (omitting
j 8
param for kaldi build).But when I try to parse very small mp3 (10 sec) I get the error:
I try to upload bigger (about 5mg/10min) audio, there is no error in the console, just does nothing.