ubergeek77 / Lemmy-Easy-Deploy

Deploy Lemmy the easy way!
MIT License
133 stars 14 forks source link

Can’t deploy 0.18.4, connection errors #63

Closed Christoph-Wagner closed 1 year ago

Christoph-Wagner commented 1 year ago

Did you check the FAQ & Troubleshooting section for answers to common questions and issues?

Yes

Describe the issue

Deployment fails, I extracted what seemed relevant from the failure log:

First ui errors:

lemmy-easy-deploy-lemmy-ui-1  | http://0.0.0.0:1234
lemmy-easy-deploy-lemmy-ui-1  | API error: FetchError: request to http://lemmy:8536/api/v3/site? failed, reason: getaddrinfo ENOTFOUND lemmy

Then what probably blocks deployment from lemmy-easy-deploy-lemmy-1 (this error repeats 7 times

thread 'main' panicked at 'Error connecting to postgres://lemmy:password@postgres:5432/lemmy: could not connect to server: Connection refused Is the server running on host "postgres" (172.21.0.3) and accepting TCP/IP connections on port 5432?

Caddy / lemmy-easy-deploy-proxy-1 also throws some connection errors:

dial tcp 172.21.0.4:8536: connect: connection refused
dial tcp: lookup lemmy on 127.0.0.11:53: no such host
dial tcp 172.21.0.4:8536: i/o timeout

Postgres seems to run fine despite not binding to ipv6

lemmy-easy-deploy-postgres-1  | 2023-08-09 03:46:54.922 GMT [1] LOG:  listening on IPv4 address "127.0.0.1", port 5432
lemmy-easy-deploy-postgres-1  | 2023-08-09 03:46:54.922 GMT [1] LOG:  could not bind IPv6 address "::1": Address not available

For the sake of completeness, the rather boring ./custom/customPostgresql.conf:

# DB Version: 15
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 4 GB
# CPUs num: 2
# Data Storage: ssd

max_connections = 200
shared_buffers = 1GB
effective_cache_size = 3GB
maintenance_work_mem = 256MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 2621kB
min_wal_size = 1GB
max_wal_size = 4GB

Diagnostic Information

Run ./deploy.sh -d and paste the output below:


==== Docker Information ====
Detected runtime: docker (Docker version 24.0.5, build ced0996)
Detected compose: docker compose (Docker Compose version v2.20.2)
Runtime state: OK

==== System Information ==== OS: Linux KERNEL: 6.1.0-9-arm64 (aarch64) HOSTNAME: OK SHELL: bash MEMORY: total used free shared buff/cache available Mem: 3.7Gi 382Mi 2.3Gi 3.8Mi 1.3Gi 3.4Gi Swap: 0B 0B 0B

DISTRO:

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_CODENAME=bookworm

==== Lemmy-Easy-Deploy Information ==== Version: 1.3.0

IMAGE CREATED STATUS

Integrity: 0d3e213450ba646ab61881103a7ffcb2283b8152f36fff97ab735a704f069aa7 ./deploy.sh 587ca168ac5a0d1644df650d100711197c66fb6bf854f7cce0e29df35369e9c1 ./templates/Caddy-Dockerfile.template c1202e70662dd2228da36a35a0f38ec8fc81bec8964d7315d02e8671a58dd7d7 ./templates/Caddyfile.template 2537678c7971df36c1ed95f4228d3cfcb15bb4a28a60d939eaf8dd75b5d64a36 ./templates/cloudflare.snip c9cb4c5fee12930e17798a02ae1bd12e2dc69e149a394c24511bc9d4e6b776d4 ./templates/compose-email.snip c494a610bcb4cd1cfc0a4fe4fb0f6d437b2a84a0ad1625daee240e6dd6f1c910 ./templates/compose-email-volumes.snip f5325a9e26b29da51c6d3295aa278ff08ce71ffd2cd63dc4bebf00e54c468899 ./templates/docker-compose.yml.template 1c202b1b6e87c65b2fcda6035c9fe3f8631d76662907ffd38f24b14686e30647 ./templates/lemmy-email.snip c834cdce9eaf77f38155b404724fdfe66845575386ee516987452aa715642a6f ./templates/lemmy.hjson.template

Custom Files: total 4.0K -rw-r--r-- 1 0 0 406 Jul 4 16:11 customPostgresql.conf

==== Settings ==== CLOUDFLARE: No CADDY_DISABLE_TLS: false CADDY_HTTP_PORT: 80 CADDY_HTTPS_PORT: 443 LEMMY_TLS_ENABLED: true ENABLE_EMAIL: true SMTP_PORT: 465 ENABLE_POSTFIX: false POSTGRES_POOL_SIZE: 100

==== Generated Files ==== Deploy Version: 0.18.3;0.18.3

total 19M drwxr-xr-x 2 0 0 4.0K Jul 6 16:46 caddy -rw-r--r-- 1 0 0 32 Aug 9 03:46 caddy.env -rw-r--r-- 1 70 0 406 Aug 9 03:46 customPostgresql.conf -rw-r--r-- 1 0 0 1.7K Aug 9 03:46 docker-compose.yml -rw-r--r-- 1 0 0 50 Jul 4 16:13 lemmy.env -rw-r--r-- 1 0 0 695 Aug 9 03:46 lemmy.hjson -rw-r--r-- 1 0 0 19M Jul 29 06:18 lemmy_log.out -rw-r--r-- 1 0 0 49 Jul 4 16:13 pictrs.env -rw-r--r-- 1 0 0 36 Aug 9 03:46 postfix.env -rw-r--r-- 1 0 0 51 Jul 4 16:13 postgres.env -rw-r--r-- 1 0 0 14 Jul 28 14:35 version

ubergeek77 commented 1 year ago

Something is very wrong with your internal Docker networking. None of your services can talk to each other.

Docker isn't responding properly to some DNS requests:

dial tcp: lookup lemmy on 127.0.0.11:53: no such host

And in some cases DNS requests resolve properly but no connection is allowed:

dial tcp 172.21.0.4:8536: connect: connection refused

These issues would impact other Docker Compose services on your machine too, not just Lemmy Easy Deploy.

I do not know what to recommend to assist you with this. Things you can try:

If that doesn't work, I'm not sure, sorry :(

Hopefully it's one of those quick fixes!

Christoph-Wagner commented 1 year ago

The interesting part is, that the old (0.18.3) deployment was happily running, no connection errors or anything there.

Iptables should be the default (and again, also work for the old version):

root@lemmy-main: iptables -S
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN

Ensure your system hostname does not contain lemmy.

It does: lemmy-main, but that was never an issue before. I restarted & changed it to a different one (with hostname newname after the restart), same issues.

But I also found more issues, docker (often even after restarting) would not list anything with docker ps, the old version would not run reliably anymore, logs would not show up.

I restored the server from backup, and found out that iptables used to have a few more lines:

-A FORWARD -o br-3f409d2b7bca -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-3f409d2b7bca -j DOCKER
-A FORWARD -i br-3f409d2b7bca ! -o br-3f409d2b7bca -j ACCEPT
-A FORWARD -i br-3f409d2b7bca -o br-3f409d2b7bca -j ACCEPT
-A DOCKER -d 172.19.0.3/32 ! -i br-3f409d2b7bca -o br-3f409d2b7bca -p tcp -m tcp --dport 443 -j ACCEPT
-A DOCKER -d 172.19.0.3/32 ! -i br-3f409d2b7bca -o br-3f409d2b7bca -p tcp -m tcp --dport 80 -j ACCEPT

-A DOCKER-ISOLATION-STAGE-1 -i br-3f409d2b7bca ! -o br-3f409d2b7bca -j DOCKER-ISOLATION-STAGE-2

-A DOCKER-ISOLATION-STAGE-2 -o br-3f409d2b7bca -j DROP

I also found out that attempting deployment of 0.18.4 again would break things again, in the same way.

I’m now planning to leave 0.18.3 running for a while to see if those issues appear for anyone else and maybe have a fix.

pallebone commented 1 year ago

I know the issue. It is because the documentation is incorrect, and I had the same error. This file is invalid:

# DB Version: 15
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 4 GB
# CPUs num: 2
# Data Storage: ssd

max_connections = 200
shared_buffers = 1GB
effective_cache_size = 3GB
maintenance_work_mem = 256MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 2621kB
min_wal_size = 1GB
max_wal_size = 4GB

Please rename this file and redeploy. If it works then use a valid config (will go get one for you).

pallebone commented 1 year ago

Correct file based off your settings above:

listen_addresses = '*'
dynamic_shared_memory_type = posix
log_timezone = 'UTC'
datestyle = 'iso, mdy'
timezone = 'UTC'
lc_messages = 'en_US.utf8'                      # locale for system error message
lc_monetary = 'en_US.utf8'                      # locale for monetary formatting
lc_numeric = 'en_US.utf8'                       # locale for number formatting
lc_time = 'en_US.utf8'                          # locale for time formatting
default_text_search_config = 'pg_catalog.english'
max_connections = 200
shared_buffers = 1GB
effective_cache_size = 3GB
maintenance_work_mem = 256MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 2621kB
min_wal_size = 1GB
max_wal_size = 4GB
Christoph-Wagner commented 1 year ago

Thanks, won’t have time to look into this until tomorrow morning, but will report back then.

Christoph-Wagner commented 1 year ago

Thanks, it worked perfectly! So the way I see it, these parts are mandatory?

listen_addresses = '*'
dynamic_shared_memory_type = posix
log_timezone = 'UTC'
datestyle = 'iso, mdy'
timezone = 'UTC'
lc_messages = 'en_US.utf8'                      # locale for system error message
lc_monetary = 'en_US.utf8'                      # locale for monetary formatting
lc_numeric = 'en_US.utf8'                       # locale for number formatting
lc_time = 'en_US.utf8'                          # locale for time formatting
default_text_search_config = 'pg_catalog.english'

Is this an issue with the lemmy docs or something LED specific?

pallebone commented 1 year ago

Thats what was in the docker image conf file by default and when you replace it they are missing so I just put back in the default values I copied from the file. I imagine the values can be changed if you have a need but those were the default values from prior to adding in your custom file which wipes all the values that were in there already.

I am guessing lemmy docs is wrong but cant be 100% sure. First time using postgres personally.