lowerquality / gentle

gentle forced aligner
https://lowerquality.com/gentle/
MIT License
1.43k stars 295 forks source link

Gentle built from latest source in docker error: Broken pipe #226

Open wclr opened 5 years ago

wclr commented 5 years ago

I've managed to build the latest source docker image (omitting j 8 param for kaldi build).

But when I try to parse very small mp3 (10 sec) I get the error:

INFO:root:gentle 0.10.1
INFO:root:listening at 0.0.0.0:8765

INFO:root:SERVE 8765, 0.0.0.0, 1
INFO:root:about to listen
INFO:root:listening
Exception ignored in: <bound method Kaldi.__del__ of <gentle.standard_kaldi.Kaldi object at 0x7f1ee8472dd8>>
Traceback (most recent call last):
  File "/gentle/gentle/standard_kaldi.py", line 77, in __del__
    self.stop()
  File "/gentle/gentle/standard_kaldi.py", line 71, in stop
    self._cmd("stop")
  File "/gentle/gentle/standard_kaldi.py", line 28, in _cmd
    self._p.stdin.write(("%s\n" % (c)).encode())
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <bound method Kaldi.__del__ of <gentle.standard_kaldi.Kaldi object at 0x7f1ee8472be0>>
Traceback (most recent call last):
  File "/gentle/gentle/standard_kaldi.py", line 77, in __del__
    self.stop()
  File "/gentle/gentle/standard_kaldi.py", line 71, in stop
    self._cmd("stop")
  File "/gentle/gentle/standard_kaldi.py", line 28, in _cmd
    self._p.stdin.write(("%s\n" % (c)).encode())
BrokenPipeError: [Errno 32] Broken pipe
Unhandled error in Deferred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/Twisted-19.2.1-py3.6-linux-x86_64.egg/twisted/_threads/_threadworker.py", line 46, in work
    task()
  File "/usr/local/lib/python3.6/dist-packages/Twisted-19.2.1-py3.6-linux-x86_64.egg/twisted/_threads/_team.py", line 190, in doWork
    task()
--- <exception caught here> ---
  File "/usr/local/lib/python3.6/dist-packages/Twisted-19.2.1-py3.6-linux-x86_64.egg/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/usr/local/lib/python3.6/dist-packages/Twisted-19.2.1-py3.6-linux-x86_64.egg/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.6/dist-packages/Twisted-19.2.1-py3.6-linux-x86_64.egg/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.6/dist-packages/Twisted-19.2.1-py3.6-linux-x86_64.egg/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "serve.py", line 102, in transcribe
    output = trans.transcribe(wavfile, progress_cb=on_progress, logging=logging)
  File "/gentle/gentle/forced_aligner.py", line 23, in transcribe
    words, duration = self.mtt.transcribe(wavfile, progress_cb=progress_cb)
  File "/gentle/gentle/transcriber.py", line 51, in transcribe
    pool.map(transcribe_chunk, range(n_chunks))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/gentle/gentle/transcriber.py", line 39, in transcribe_chunk
    ret = k.get_final()
  File "/gentle/gentle/standard_kaldi.py", line 42, in get_final
    self._cmd("get-final")
  File "/gentle/gentle/standard_kaldi.py", line 28, in _cmd
    self._p.stdin.write(("%s\n" % (c)).encode())
builtins.BrokenPipeError: [Errno 32] Broken pipe

I try to upload bigger (about 5mg/10min) audio, there is no error in the console, just does nothing.

linjian commented 4 years ago

I got the same error. @whitecolor Did you find the solution?

someonefighting commented 4 years ago

I got the same error.

scbash commented 4 years ago

The broken pipe error comes from the Kaldi process dying before returning the result.

In my case, the container was built on a newer machine and Kaldi/OpenBLAS were thus built to use newer CPU features (AVX512 specifically). When run on an older machine k3 would die due to SIGILL (illegal instruction). Docker does not address this sort of problem, the application needs to be built for whatever system (or family of systems) it is going to run on. In my case the oldest machine we needed to work was an Ivy Bridge generation processor, so adding TARGET=SANDYBRIDGE to the openblas_compile target in ext/kaldi/tools/Makefile solved the problem (see OpenBLAS/Targets.txt for accepted platforms). That's not to say an illegal instruction is always the problem, just my particular case.

I found it very helpful to run a script, e.g. align.py, inside the container under strace to see what killed k3:

$ apt update
$ apt install -y strace
$ strace -f -o trace.log python3 ...

Open up trace.log and search for "get-final" and usually you'll see something like this:

360   +++ killed by SIGILL (core dumped) +++
361   --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=360, si_uid=0, si_status=SIGILL, si_utime=0, si_stime=2} ---
325   <... futex resumed> )             = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
325   futex(0x120a1c0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff <unfinished ...>
361   write(5, "get-final\n", 10)       = -1 EPIPE (Broken pipe)
361   --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=325, si_uid=0} ---

In this case process 361 is Python waiting for data from k3, and it receives SIGPIPE due to the broken pipe. A little higher up we see Python also got SIGCHLD indicating a child process (k3 in this case, pid 360) died due to SIGILL, and just before that we see process 360 was killed by SIGILL. With a little bit of investigation you might be able to pin down what's causing k3 to crash.

yya518 commented 4 years ago

Python waiting for data from k3, and it receives SIGPIPE due to the broken pipe. A little higher up we see Python also got SIGCHLD indicating a child process (k3 in this case, pid 360) died due to SIGILL, and just before that we see process 360 was killed by SIGILL. With a little bit of investigation you might be able to pin down what's causing k3 to crash.

Nice investigation! How to fix this issue?

scbash commented 4 years ago

Nice investigation! How to fix this issue?

In my case the solution was to make sure OpenBLAS is compiled for the oldest processor the container will run on:

... the oldest machine we needed to work was an Ivy Bridge generation processor, so adding TARGET=SANDYBRIDGE to the openblas_compile target in ext/kaldi/tools/Makefile solved the problem

But that might not be the solution for everyone. k3 could crash for different reasons, so I outlined the longer debugging process to help people figure out what was going wrong on their machines.