wotschofsky / discord-live-translator

Voice Translation Bot for Discord
https://livetranslator.xyz
GNU Affero General Public License v3.0
63 stars 13 forks source link

Self-hosted bot crashes when analyzing recording (fetch to stt failed) #42

Open kslen opened 11 months ago

kslen commented 11 months ago

Hello there!

I'm attempting to get your creation up and running on a Debian 12.1 install as per instructions in README.md. The install is up to date and the box has plenty of free storage and memory as the problem I'm about to delve into occur. vm.overcommit_memory = 1 has also been set and verified as part of my blind stab at resolving this issue.

I've successfully managed to get the bot onto our server, commands are being recognized and all seems well up to this point. The bot acknowledges our desired language, joins the voice channel when requested and plays its little fanfare. :)

However, a few errors are thrown during startup.

ghcr.io/wotschofsky/discord-live-translator log:

[ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.3:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
[ioredis] Unhandled error event: Error: connect ETIMEDOUT
    at Socket.<anonymous> (/app/node_modules/ioredis/built/Redis.js:170:41)
    at Object.onceWrapper (node:events:628:28)
    at Socket.emit (node:events:514:28)
    at Socket._onTimeout (node:net:571:8)
    at listOnTimeout (node:internal/timers:569:17)
    at processTimers (node:internal/timers:512:7)
[ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:108:26)
[ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:108:26)
[ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:108:26)
[ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:108:26)
[ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:108:26)
[ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:108:26)
[ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.3:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
yarn run v1.22.19
$ node -r source-map-support/register dist/app.js
[2023-08-30T00:42:41.681Z] Logged in as Super Secret Bot#1337!

Otherwise the logs look alright besides of redis lusting for a config.

Redis log:

1:C 30 Aug 2023 00:42:38.733 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 30 Aug 2023 00:42:38.733 * Redis version=7.2.0, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 30 Aug 2023 00:42:38.733 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 30 Aug 2023 00:42:38.733 * monotonic clock: POSIX clock_gettime
1:M 30 Aug 2023 00:42:38.733 * Running mode=standalone, port=6379.
1:M 30 Aug 2023 00:42:38.734 * Server initialized
1:M 30 Aug 2023 00:42:38.734 * Loading RDB produced by version 7.2.0
1:M 30 Aug 2023 00:42:38.734 * RDB age 28 seconds
1:M 30 Aug 2023 00:42:38.734 * RDB memory usage when created 0.85 Mb
1:M 30 Aug 2023 00:42:38.734 * Done loading RDB, keys loaded: 1, keys expired: 0.
1:M 30 Aug 2023 00:42:38.734 * DB loaded from disk: 0.000 seconds
1:M 30 Aug 2023 00:42:38.734 * Ready to accept connections tcp
1:M 30 Aug 2023 01:42:39.053 * 1 changes in 3600 seconds. Saving...
1:M 30 Aug 2023 01:42:39.053 * Background saving started by pid 14
14:C 30 Aug 2023 01:42:39.167 * DB saved on disk
14:C 30 Aug 2023 01:42:39.167 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 30 Aug 2023 01:42:39.254 * Background saving terminated with success

Libretranslate log:

Updating language models
Found 60 models
Keep 8 models
Downloading English → French (1.0) ...
Downloading English → German (1.0) ...
Downloading English → Italian (1.0) ...
Downloading English → Spanish (1.0) ...
Downloading French → English (1.0) ...
Downloading German → English (1.0) ...
Downloading Italian → English (1.0) ...
Downloading Spanish → English (1.0) ...
Loaded support for 5 languages (8 models total)!
Running on http://0.0.0.0:5000

ghcr.io/wotschofsky/discord-live-translator/stt log:

2023-08-30T00:42:38.659Z` INFO  [stt] Starting server at http://127.0.0.1:8000
2023-08-30T00:42:38.660Z INFO  [actix_server::builder] starting 1 workers
2023-08-30T00:42:38.660Z INFO  [actix_server::server] Actix runtime found; starting in Actix runtime
whisper_init_from_file_no_state: loading model from 'models/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3
whisper_model_load: mem required  =  743.00 MB (+   16.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
2023-08-30T00:44:35.718Z INFO  [stt] Starting server at http://127.0.0.1:8000
2023-08-30T00:44:35.719Z INFO  [actix_server::builder] starting 1 workers
2023-08-30T00:44:35.719Z INFO  [actix_server::server] Actix runtime found; starting in Actix runtime

The problems begin as someone starts speaking. One can see in the logs that voice is being recorded, but the following error pops up in the ghcr.io/wotschofsky/discord-live-translator log:

[2023-08-30T00:44:31.969Z] Started recording 301695752688893952 to /app/cache/rec/301695752688893952_1693356271955.wav
[2023-08-30T00:44:32.332Z] Successfully finished recording 301695752688893952 to /app/cache/rec/301695752688893952_1693356271955.wav
[2023-08-30T00:44:32.333Z] Analyzing "/app/cache/rec/301695752688893952_1693356271955.wav"...
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at recognizeRecording (/app/src/processors/recognizeRecording.ts:11:20)
    at _SpeakingMap.<anonymous> (/app/src/commands/join.ts:99:30)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Any thoughts on what might be going on and how to resolve it? I'm a complete newbie to docker, so I figured before I poke at things further; I'd consult the oracles. :D

Best regards, yours truely.

wotschofsky commented 11 months ago

Hi, thanks for raising this!

Do you see log entries like the following from the STT container after speaking?

whisper_init_state: kv self size  =   15.75 MB
whisper_init_state: kv cross size =   52.73 MB

Do you see a spike in CPU usage after speaking (for example through htop)?

What are your hardware specs? Is it virtualized or bare metal?

Redis not finding a config is to be expected, but I am surprised by the connection, given that the bot container is instructed to wait for Redis to start first. Can you try pinging Redis by running the following command docker exec discord-live-translator-redis-1 redis-cli PING? It should return "PONG".

kslen commented 11 months ago

Hello again and thanks for a quick response! :D

I'll start off with your initial question, and no, no such entries can be seen in the log.

Regarding specs; it's a KVM with 1 core and initially 2 GB of memory, but increased to 8 GB see if it was a resource issue, but unfortunately with no change in outcome.

The host hardware is a Ryzen 7 3900x with 128 GB 3200 MHz DDR4 ECC. Memory is well managed and nowhere near capacity. Storage consists of 8x1 TB 7200 RPM 64MB Cache drives using raidz2 with lz4 compression enabled. The guest storage is currently qcow2, so all in all I would like to think that it is close to optimal in its configuration when aiming for performance and based on what it is. :D

I've reviewed the host performance in the period leading up to and including the crash which I just provoked. There's no iodelay of note and as it's my home server, there's not much load due to me and my machines being its only users. During the test i just performed, the hosts CPU utilization peaked at 4%.

Here's a screenshot of the KVMs performance graphs where the tiny spike at the end is when I invited the bot into a voice channel and started talking: Screenshot from 2023-08-30 19-25-38

The spike and over all increase in resource utilization in the screenshot above is due to me installing and then starting glances on the KVM to see if it would produce any warnings, which unfortunately it did not as seen here after the crash: Screenshot from 2023-08-30 19-24-30

I can't see anything else in journalctl on the host nor guest that strikes me as relevant or would fall into the error category, but will provide output from the same time period if requested. :)

docker container ls after crash:

CONTAINER ID   IMAGE                                             COMMAND                  CREATED        STATUS         PORTS      NAMES
b3587503758b   ghcr.io/wotschofsky/discord-live-translator       "docker-entrypoint.s…"   18 hours ago   Up 3 minutes              discord-live-translator-bot-1
7e414ea3e796   libretranslate/libretranslate:v1.3.11             "./venv/bin/libretra…"   18 hours ago   Up 17 hours    5000/tcp   discord-live-translator-libretranslate-1
3bc2770124ae   redis:alpine                                      "docker-entrypoint.s…"   18 hours ago   Up 17 hours    6379/tcp   discord-live-translator-redis-1
34fdc03d4511   ghcr.io/wotschofsky/discord-live-translator/stt   "./stt"                  18 hours ago   Up 3 minutes   8000/tcp   discord-live-translator-stt-1

Unfortunately, I forgot to ping redis before provoking the crash, but it replied after so I would think it would have before as well.

dlt@bot:~$ docker exec discord-live-translator-redis-1 redis-cli PING
PONG

ghcr.io/wotschofsky/discord-live-translator log:

[2023-08-30T00:44:37.225Z] Logged in as Super Secret Bot#1337!
[2023-08-30T17:22:11.020Z] Started recording 301695752688893952 to /app/cache/rec/301695752688893952_1693416131018.wav
[2023-08-30T17:22:12.078Z] Successfully finished recording 301695752688893952 to /app/cache/rec/301695752688893952_1693416131018.wav
[2023-08-30T17:22:12.079Z] Analyzing "/app/cache/rec/301695752688893952_1693416131018.wav"...
[2023-08-30T17:22:13.429Z] Started recording 301695752688893952 to /app/cache/rec/301695752688893952_1693416133428.wav
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at recognizeRecording (/app/src/processors/recognizeRecording.ts:11:20)
    at _SpeakingMap.<anonymous> (/app/src/commands/join.ts:99:30)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
yarn run v1.22.19
$ node -r source-map-support/register dist/app.js
[2023-08-30T17:22:17.150Z] Logged in as Super Secret Bot#1337!

libretranslate/libretranslate:v1.3.11 log:

Running on http://0.0.0.0:5000

No other information outside of this line is produced when restarting the container, and no errors related to the crash present.

redis:alpine log:

1:M 30 Aug 2023 17:21:15.309 * 1 changes in 3600 seconds. Saving...
1:M 30 Aug 2023 17:21:15.310 * Background saving started by pid 15
15:C 30 Aug 2023 17:21:15.377 * DB saved on disk
15:C 30 Aug 2023 17:21:15.377 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 30 Aug 2023 17:21:15.410 * Background saving terminated with success

ghcr.io/wotschofsky/discord-live-translator/stt log:

whisper_init_from_file_no_state: loading model from 'models/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3
whisper_model_load: mem required  =  743.00 MB (+   16.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
2023-08-30T17:22:15.506Z INFO  [stt] Starting server at http://127.0.0.1:8000
2023-08-30T17:22:15.506Z INFO  [actix_server::builder] starting 1 workers
2023-08-30T17:22:15.506Z INFO  [actix_server::server] Actix runtime found; starting in Actix runtime

Hope this is of use. :)

wotschofsky commented 11 months ago

Can you please try resolving the internal ip for the stt container like follows:

  1. Start shell session inside the bot container docker exec -it discord-live-translator-bot-1 bash
  2. Install some utilities apt install curl dnsutils iputils-ping
  3. Run dig stt, ping stt and curl --head stt:8000 and post the results
kslen commented 10 months ago

Hey!

Pardon the late response. Been busy days.

Edit of shame: Brain started ticking over and here's the output as requested. :D

root@b3587503758b:/app# dig stt

; <<>> DiG 9.18.16-1~deb12u1-Debian <<>> stt
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30371
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;stt.               IN  A

;; ANSWER SECTION:
stt.            600 IN  A   172.18.0.3

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11) (UDP)
;; WHEN: Mon Sep 04 23:40:35 UTC 2023
;; MSG SIZE  rcvd: 40
root@b3587503758b:/app# ping stt
PING stt (172.18.0.3) 56(84) bytes of data.
64 bytes from discord-live-translator-stt-1.discord-live-translator_default (172.18.0.3): icmp_seq=1 ttl=64 time=0.099 ms
64 bytes from discord-live-translator-stt-1.discord-live-translator_default (172.18.0.3): icmp_seq=2 ttl=64 time=0.048 ms
64 bytes from discord-live-translator-stt-1.discord-live-translator_default (172.18.0.3): icmp_seq=3 ttl=64 time=0.033 ms
64 bytes from discord-live-translator-stt-1.discord-live-translator_default (172.18.0.3): icmp_seq=4 ttl=64 time=0.052 ms
64 bytes from discord-live-translator-stt-1.discord-live-translator_default (172.18.0.3): icmp_seq=5 ttl=64 time=0.042 ms
64 bytes from discord-live-translator-stt-1.discord-live-translator_default (172.18.0.3): icmp_seq=6 ttl=64 time=0.034 ms
64 bytes from discord-live-translator-stt-1.discord-live-translator_default (172.18.0.3): icmp_seq=7 ttl=64 time=0.050 ms
64 bytes from discord-live-translator-stt-1.discord-live-translator_default (172.18.0.3): icmp_seq=8 ttl=64 time=0.036 ms
^C
--- stt ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 7155ms
rtt min/avg/max/mdev = 0.033/0.049/0.099/0.020 ms
root@b3587503758b:/app# curl --head stt:8000
HTTP/1.1 405 Method Not Allowed
content-length: 0
allow: POST
date: Mon, 04 Sep 2023 23:42:20 GMT
wotschofsky commented 10 months ago

Hi, thanks for your patience! From the command output, it doesn't seem like Docker is at fault here. I've heard from friends that Node.js 18 fetch is still a bit weird, so I have upgraded to Node.js 20.

Can you try updating the images and see if that fixes it? docker compose -f docker-compose.prod.yml pull docker compose -f docker-compose.prod.yml up -d

kslen commented 10 months ago

Hello again, and no problem. Take the time you need. :)

Unfortunately, no dice.

[2023-09-11T01:02:04.093Z] Logged in as Super Secret Bot#1337!
Successfully updated guild (/) commands.
Successfully initialized guild commands!
[2023-09-11T01:02:59.305Z] Started recording 301695752688893952 to /app/cache/rec/301695752688893952_1694394179304.wav
[2023-09-11T01:03:00.357Z] Successfully finished recording 301695752688893952 to /app/cache/rec/301695752688893952_1694394179304.wav
[2023-09-11T01:03:00.357Z] Analyzing "/app/cache/rec/301695752688893952_1694394179304.wav"...
[2023-09-11T01:03:00.575Z] Started recording 301695752688893952 to /app/cache/rec/301695752688893952_1694394180575.wav
[2023-09-11T01:03:00.893Z] Successfully finished recording 301695752688893952 to /app/cache/rec/301695752688893952_1694394180575.wav
[2023-09-11T01:03:00.894Z] Analyzing "/app/cache/rec/301695752688893952_1694394180575.wav"...
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11600:11)
    at recognizeRecording (/app/src/processors/recognizeRecording.ts:11:20)
    at _SpeakingMap.<anonymous> (/app/src/commands/join.ts:99:30)
 ELIFECYCLE  Command failed with exit code 1.

> discord-live-translator@1.0.0 start /app
> node -r source-map-support/register dist/app.js

Started updating guild (/) commands.
Successfully updated guild (/) commands.
Successfully initialized guild commands!
[2023-09-11T01:03:04.669Z] Logged in as Super Secret Bot#1337!
1:C 11 Sep 2023 01:01:56.243 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 11 Sep 2023 01:01:56.243 * Redis version=7.2.1, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 11 Sep 2023 01:01:56.243 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 11 Sep 2023 01:01:56.244 * monotonic clock: POSIX clock_gettime
1:M 11 Sep 2023 01:01:56.244 * Running mode=standalone, port=6379.
1:M 11 Sep 2023 01:01:56.245 * Server initialized
1:M 11 Sep 2023 01:01:56.245 * Loading RDB produced by version 7.2.0
1:M 11 Sep 2023 01:01:56.245 * RDB age 15 seconds
1:M 11 Sep 2023 01:01:56.245 * RDB memory usage when created 0.85 Mb
1:M 11 Sep 2023 01:01:56.245 * Done loading RDB, keys loaded: 1, keys expired: 0.
1:M 11 Sep 2023 01:01:56.245 * DB loaded from disk: 0.000 seconds
1:M 11 Sep 2023 01:01:56.245 * Ready to accept connections tcp
2023-09-11T01:01:56.137Z INFO  [stt] Starting server at http://127.0.0.1:8000
2023-09-11T01:01:56.217Z INFO  [actix_server::builder] starting 1 workers
2023-09-11T01:01:56.217Z INFO  [actix_server::server] Actix runtime found; starting in Actix runtime
whisper_init_from_file_no_state: loading model from 'models/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3
whisper_model_load: mem required  =  743.00 MB (+   16.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
2023-09-11T01:03:02.779Z INFO  [stt] Starting server at http://127.0.0.1:8000
2023-09-11T01:03:02.779Z INFO  [actix_server::builder] starting 1 workers
2023-09-11T01:03:02.779Z INFO  [actix_server::server] Actix runtime found; starting in Actix runtime
Updating language models
Found 60 models
Keep 8 models
Downloading English → French (1.0) ...
Downloading English → German (1.0) ...
Downloading English → Italian (1.0) ...
Downloading English → Spanish (1.0) ...
Downloading French → English (1.0) ...
Downloading German → English (1.0) ...
Downloading Italian → English (1.0) ...
Downloading Spanish → English (1.0) ...
Loaded support for 5 languages (8 models total)!
Running on http://0.0.0.0:5000
Running on http://0.0.0.0:5000
Running on http://0.0.0.0:5000
Running on http://0.0.0.0:5000
Running on http://0.0.0.0:5000
Running on http://0.0.0.0:5000
Running on http://0.0.0.0:5000
Running on http://0.0.0.0:5000

Could this be some missing dependency thing? Again, my ignorance may shine through, but the Debian install is just base + ssh server, then I've installed as per procedure in README.md, or at least to the best of my recollection. #weeksof4hournights :D

Maybe it would be of use if you could provide me with a complete list of packages installed in your containers where you have it running successfully? At least then we are closer to being sure that we're not chasing ghosts of my making here.

Again, thanks for your ongoing support. Our Spanish team mate will absolutely love this thing if we can get it to start talking. :D

kslen commented 9 months ago

Hello again!

Sorry, I really don't want to nag, but would it be possible to get some more information about the OS and versions of packages you're able to get this running on?

I'm not married to running any configuration, I just really, really want to have this bot working as it would make our teams current issues regarding communication much less of a problem.

Thanks again!

wotschofsky commented 9 months ago

Hi, sure:

Here are the two configurations that work for me:

  1. Docker Desktop on macOS 13.5.1
  2. Debian 11 inside KVM (I believe)

In both cases running the latest versions of Docker. I hope this helps!

I've been considering moving this app to the Bun runtime. Maybe that would solve the issue or at least give some better errors.

kslen commented 9 months ago

Alrighty, put that plan on hold and I'll try to replicate your environment hopefully by the end of the weekend. Enjoy yours! :)

kslen commented 8 months ago

Ok, so we've lost the spanish guy, so our need for the bot seems to have evaporated.

If I find time (as I've surprisingly not been able to to date), I'll keep my promise to see if this is related to Debian 12, but yeah. May take a while.

This would be my suggestion if anyone else is struggling, and if you find that it does work - please relay. :)