ubergeek77 / Lemmy-Easy-Deploy

Deploy Lemmy the easy way!
MIT License
133 stars 14 forks source link

Lemmy-ui error after upgrade #98

Closed DaDosDude closed 2 months ago

DaDosDude commented 2 months ago

Did you check the FAQ & Troubleshooting section for answers to common questions and issues?

Yes/No

Yes

Describe the issue

What happened? Post any relevant log snippets.

During update it kept on getting this error in a loop during Lemmy-ui:

lemmy-ui-1  | TypeError: fetch failed
lemmy-ui-1  |     at node:internal/deps/undici/undici:13193:13
lemmy-ui-1  |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
lemmy-ui-1  |   [cause]: Error: connect ECONNREFUSED EXTERNALIPADDRESS:443
lemmy-ui-1  |       at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1607:16) {
lemmy-ui-1  |     errno: -111,
lemmy-ui-1  |     code: 'ECONNREFUSED',
lemmy-ui-1  |     syscall: 'connect',
lemmy-ui-1  |     address: 'EXTERNALIPADDRESS',
lemmy-ui-1  |     port: 443
lemmy-ui-1  |   }
lemmy-ui-1  | }

Diagnostic Information

Run ./deploy.sh -d and paste the output below:


==== Docker Information ====
Detected runtime: docker (Docker version 26.1.4, build 5650f9b)
Detected compose: docker compose (Docker Compose version v2.27.1)
Runtime state: OK

==== System Information ==== OS: Linux KERNEL: 6.1.0-21-cloud-amd64 (x86_64) HOSTNAME: OK SHELL: bash MEMORY: total used free shared buff/cache available Mem: 7.7Gi 1.1Gi 171Mi 143Mi 6.9Gi 6.6Gi Swap: 0B 0B 0B

DISTRO:

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_CODENAME=bookworm

==== Lemmy-Easy-Deploy Information ==== Version: 1.4.2

IMAGE CREATED STATUS ghcr.io/ubergeek77/lemmy-ui:0.19.5 43 minutes ago Up 43 minutes (unhealthy) ghcr.io/ubergeek77/lemmy:0.19.5 43 minutes ago Up 43 minutes pgautoupgrade/pgautoupgrade:16-alpine 43 minutes ago Up 43 minutes asonix/pictrs:0.5 43 minutes ago Up 43 minutes

Integrity: 12324004fe5f23821783c7e5928082708a0d61f37a5e7358d8fbbe7f6312573e ./deploy.sh 92c95dfc886792b8df2e9fffb540fc71a35c3bc6fd6c7662134da1545a79457a ./templates/Caddy-Dockerfile.template c1202e70662dd2228da36a35a0f38ec8fc81bec8964d7315d02e8671a58dd7d7 ./templates/Caddyfile.template 2537678c7971df36c1ed95f4228d3cfcb15bb4a28a60d939eaf8dd75b5d64a36 ./templates/cloudflare.snip c494a610bcb4cd1cfc0a4fe4fb0f6d437b2a84a0ad1625daee240e6dd6f1c910 ./templates/compose-email-volumes.snip c9cb4c5fee12930e17798a02ae1bd12e2dc69e149a394c24511bc9d4e6b776d4 ./templates/compose-email.snip e62a363b5fb7f94aac8274ddfe22d22c6d07a331064b0c078845613ac6ac91c5 ./templates/customPostgresql.conf 0ec8ceb82e1f9f6517a229d964c4949c3e0501019ab94cc62e6f8c29c2e56203 ./templates/docker-compose.yml.template 1c202b1b6e87c65b2fcda6035c9fe3f8631d76662907ffd38f24b14686e30647 ./templates/lemmy-email.snip 6f937ab0cbe01b0cc6f9428b18dde987529e690d4ca24a22027c002617539fda ./templates/lemmy.hjson.template

Custom Files: No custom files

==== Settings ==== CLOUDFLARE: No CADDY_DISABLE_TLS: false CADDY_HTTP_PORT: 80 CADDY_HTTPS_PORT: 443 LEMMY_TLS_ENABLED: true ENABLE_EMAIL: true SMTP_PORT: 25 ENABLE_POSTFIX: true POSTGRES_POOL_SIZE: 5 POSTGRES_SHM_SIZE: 64m

==== Generated Files ==== Deploy Version: 0.19.3;0.19.3

total 56K drwxr-xr-x 2 0 0 4.0K Jun 9 20:32 caddy -rw-r--r-- 1 0 0 30 Jun 22 09:42 caddy.env -rw-r--r-- 1 70 0 853 Jun 10 09:34 customPostgresql.conf -rw-r--r-- 1 0 0 2.2K Jun 22 09:42 docker-compose.yml -rw-r--r-- 1 0 0 1.8K Jun 10 10:34 docker-compose.ymlbck drwxr-xr-x 14 0 0 4.0K Jul 22 2023 lemmy drwxr-xr-x 8 0 0 4.0K Jun 9 20:26 lemmy-ui drwxr-xr-x 3 0 0 4.0K May 24 13:01 lemmy-ui-themes -rw-r--r-- 1 0 0 50 Jul 20 2023 lemmy.env -rw-r--r-- 1 0 0 601 Jun 22 09:42 lemmy.hjson -rw-r--r-- 1 0 0 106 Jun 22 02:14 pictrs.env -rw-r--r-- 1 0 0 34 Jun 22 09:42 postfix.env -rw-r--r-- 1 0 0 51 Jul 20 2023 postgres.env -rw-r--r-- 1 0 0 14 Jun 10 09:34 version

ubergeek77 commented 2 months ago

Is EXTERNALIPADDRESS literally in the logs, or did you redact your own IP?

If you have a site icon, can you try following these instructions to clear your site icon and redeploy?

https://github.com/ubergeek77/Lemmy-Easy-Deploy/issues/97#issuecomment-2183470437

DaDosDude commented 2 months ago

I redacted the IP.

Removing the icon seems to have worked. I saw that issue, but since my error seemed different, I thought it was not related to my problem.

Thank you.

ubergeek77 commented 2 months ago

Thank you!

I've been getting a lot of reports about this particular issue. I thought they fixed this a year ago, but I guess it's back.

Lemmy-UI seems to want to fetch your site icon via the public domain name, rather than using the internal docker IP (which I suggested they fix a year ago 🙃 )

And it also seems, if it can't retrieve the site icon, it just fails and doesn't try to proceed without the image. I do wish they would put some error catching here.

Anyway, please try re-running your deployment with Lemmy-Easy-Deploy v1.4.3. I've changed it to deploy Caddy first, which might mitigate this issue a little bit. Let me know if that works!

DaDosDude commented 2 months ago

Thanks. That works.

Everything seems to be working now. Though the CPU usage is now very high with the new update. I looked at docker stats, and it looks like postgres is now using between 60 and 100% of the cpu making the instance very slow.

CONTAINER ID   NAME                           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
31b232dc9a19   lemmy-easy-deploy-proxy-1      0.52%     46.16MiB / 7.738GiB   0.58%     3.06GB / 1.06GB   191MB / 4.1kB     11
c0ad6a427ff8   lemmy-easy-deploy-lemmy-ui-1   27.82%    270.3MiB / 7.738GiB   3.41%     406MB / 2.78GB    353MB / 0B        55
93ef2df43739   lemmy-easy-deploy-lemmy-1      3.62%     334.7MiB / 7.738GiB   4.22%     3.5GB / 1.87GB    180MB / 0B        14
0011d890d083   lemmy-easy-deploy-postgres-1   534.45%   3.252GiB / 7.738GiB   42.03%    1.26GB / 2.86GB   386GB / 89.3GB    11
41e525a7e5fc   lemmy-easy-deploy-pictrs-1     2.81%     220.5MiB / 7.738GiB   2.78%     294MB / 187MB     1.52GB / 25.9GB   27

I have an uptime monitor running and there's a huge difference in before and after the update. Screenshot_20240623-080830_Firefox

I'm not sure how to go about this.

ubergeek77 commented 2 months ago

Does this instance serve a large number of users? I've never seen Postgres go past 10%, let alone 530%.

The only thing I can suggest is to try dropping this config file from the Lemmy team into the custom folder:

https://github.com/LemmyNet/lemmy-ansible/blob/main/examples/customPostgresql.conf

Don't forget to set the new POSTGRES_SHM_SIZE variable to match the shared_buffers Postgres setting.

For example, you'd set POSTGRES_SHM_SIZE to 2g to use this config as-is.

You can either redeploy the whole stack with ./deploy.sh -f so it picks up this config, or, if you want to avoid downtime, manually edit the files in ./live just like the script would, then only restart the Postgres container (although this might not be the best idea, not sure how the Lemmy backend is going to act if the Postgres container is suddenly stopped).

If you're messing with things in the ./live folder, remember to supply the stack name via -p lemmy-easy-deploy when using Docker Compose, or you will create conflicting containers.

I use this config, and my single user instance is basically idling at 5% CPU usage right now. Maybe this drastically increased shm size will make things less CPU intensive?

DaDosDude commented 2 months ago

I have about 50 users,and try to federate with most communities. At first I thought maybe it had to catch up from the downtime. But I don't think that's it.

Ive tried with and without a custom postgres config, and it seems to make none to very little difference.

I'm afraid there might be some errors in the database. I'll try to figure that out if that's the case.

Thanks for the help though.

ubergeek77 commented 2 months ago

You're welcome! I really do hope you find some solution.

DaDosDude commented 2 months ago

After trying to figure it out myself with loads of DDG search requests, I decided to ask on the lemmy support community. After trying different things it looks like analyze verbose; in postgres did the trick.

Sometimes after upgrades, even minor ones, I find it useful to run analyze on all of the tables. I usually do analyze verbose; so I can see which tables are getting analyzed. This will assess every table so the query planner can make better decisions about how to resolve queries. If the query planner is making bad decisions I/O and CPU will be high and query performance will be poor.

For the people trying to fix this in the future.

This means entering docker compose -p lemmy-easy-deploy exec postgres psql -U lemmy and then analyze verbose; exit back out with ctrl+d

ubergeek77 commented 2 months ago

Awesome, thank you so much for that! I had no idea about this, I'll definitely add that to the troubleshooting page later.