ubergeek77 / Lemmy-Easy-Deploy

Deploy Lemmy the easy way!
MIT License
133 stars 14 forks source link

Web interface loads but can't login anymore and shows no feeds #54

Closed squishyoctopus closed 1 year ago

squishyoctopus commented 1 year ago

Did you check the FAQ & Troubleshooting section for answers to common questions and issues?

Yes

Describe the issue

For a week or so I haven't been able to login to the web interface for my instance. When I enter my login information and click login it just reloads the login page with blank fields.

I am able to login to third party apps with no issues.

It also doesn't display any communities even clicking all.

I've tried redeploying and today I upgraded to 0.18.3 but still no luck. I have 2FA on my account but it doesn't prompt for it. It had been working fine and I hadn't changed anything when it stopped working.

Diagnostic Information

Run ./deploy.sh -d and paste the output below:


==== Docker Information ====
Detected runtime: docker (Docker version 24.0.5, build ced0996)
Detected compose: docker compose (Docker Compose version v2.20.2)
Runtime state: OK

==== System Information ==== OS: Linux KERNEL: 5.15.0-78-generic (x86_64) HOSTNAME: OK SHELL: bash MEMORY: total used free shared buff/cache available Mem: 1.9Gi 522Mi 90Mi 138Mi 1.3Gi 1.1Gi Swap: 2.3Gi 140Mi 2.2Gi

DISTRO:

PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_CODENAME=jammy UBUNTU_CODENAME=jammy

==== Lemmy-Easy-Deploy Information ==== Version: 1.2.8

IMAGE CREATED STATUS lemmy-easy-deploy-proxy 3 hours ago Up 3 hours ghcr.io/ubergeek77/lemmy-ui:0.18.3 3 hours ago Up 3 hours ghcr.io/ubergeek77/lemmy:0.18.3 3 hours ago Up 3 hours postgres:15-alpine 3 hours ago Up 3 hours asonix/pictrs:0.4.0 3 hours ago Up 3 hours

Integrity: 074ae41957936daac883c66bcac0ca12093343ef5e923752689b936b3c6b1b25 ./deploy.sh 1e9b0c0988998dcc33cb0fbfdb0e1679229424e724f898b797380adc7d102446 ./templates/Caddy-Dockerfile.template c1202e70662dd2228da36a35a0f38ec8fc81bec8964d7315d02e8671a58dd7d7 ./templates/Caddyfile.template 2537678c7971df36c1ed95f4228d3cfcb15bb4a28a60d939eaf8dd75b5d64a36 ./templates/cloudflare.snip c9cb4c5fee12930e17798a02ae1bd12e2dc69e149a394c24511bc9d4e6b776d4 ./templates/compose-email.snip c494a610bcb4cd1cfc0a4fe4fb0f6d437b2a84a0ad1625daee240e6dd6f1c910 ./templates/compose-email-volumes.snip f5325a9e26b29da51c6d3295aa278ff08ce71ffd2cd63dc4bebf00e54c468899 ./templates/docker-compose.yml.template 1c202b1b6e87c65b2fcda6035c9fe3f8631d76662907ffd38f24b14686e30647 ./templates/lemmy-email.snip c834cdce9eaf77f38155b404724fdfe66845575386ee516987452aa715642a6f ./templates/lemmy.hjson.template

Custom Files: No custom files

==== Settings ==== CLOUDFLARE: Yes CADDY_DISABLE_TLS: false CADDY_HTTP_PORT: 80 CADDY_HTTPS_PORT: 443 LEMMY_TLS_ENABLED: true ENABLE_EMAIL: true SMTP_PORT: 465 ENABLE_POSTFIX: false POSTGRES_POOL_SIZE: 5

==== Generated Files ==== Deploy Version: 0.18.3;0.18.3

total 36K drwxr-xr-x 2 0 0 4.0K Jul 1 22:10 caddy -rw-r--r-- 1 0 0 83 Jul 28 22:51 caddy.env -rw-r--r-- 1 0 0 1.7K Jul 28 22:51 docker-compose.yml -rw-r--r-- 1 0 0 50 Jul 1 22:10 lemmy.env -rw-r--r-- 1 0 0 772 Jul 28 22:51 lemmy.hjson -rw-r--r-- 1 0 0 49 Jul 1 22:10 pictrs.env -rw-r--r-- 1 0 0 33 Jul 28 22:51 postfix.env -rw-r--r-- 1 0 0 51 Jul 1 22:10 postgres.env -rw-r--r-- 1 0 0 14 Jul 28 22:52 version

ubergeek77 commented 1 year ago

This has been happening for a week? Was your instance successfully deployed, and all of a sudden the web UI stopped working? Or was this script never able to deploy successfully?

If it suddenly stopped working, then this may be a Lemmy bug. If you see any errors in the logs, they may help you find the cause so you can report it:

docker compose -p lemmy-easy-deploy logs -f

You should monitor that while you try to load the web interface, it might spit out a bunch of errors.

lordkitsuna commented 1 year ago

i also just hit this, i was on a manually pulled rc6 tag and it was working fine, updated to the latest 18.3 and now i cant login or see any community posts. i have been watching the logs but do not see any errors or warnings just axtix_web activity

ubergeek77 commented 1 year ago

If you are seeing an "Error!" page, you will have to wait for the database migration to complete. It could take up to 5 minutes if you have a lot of post data.

What exactly is happening when you load the page?

lordkitsuna commented 1 year ago

the pages load as normal just with no content. communities look empty with no posts and if you try to login it just refreshes the page having done nothing. no errors, warnings, or anything both on the page itself and in the docker logs this is just a personal server so only about 6 posts on it locally

ubergeek77 commented 1 year ago

Unfortunately I cannot reproduce this, and a few other users I know to use this script are also running 0.18.3 with no issues.

Is it possible you killed the deployment during the upgrade to 0.18.3, potentially causing database corruption? Because otherwise I don't know what could have caused this.

Based on the behavior you're describing, the Lemmy backend isn't responding properly or at all, which means something is wrong with the Lemmy backend, the Postgres container, or both.

@squishyoctopus Do you have any additional information since originally filing this issue?

lordkitsuna commented 1 year ago

as far as i know it was not killed on the initial upgrade. if you have discord or irc or something i can give you some temporary access to the machine to poke at it, lemmy is the only thing its hosting these days

ubergeek77 commented 1 year ago

That's alright, since this is the now second report, I'm wondering if there is an edge case that happens on migration. That's a Lemmy issue, not a Lemmy-Easy-Deploy issue, but I can at least try to help you narrow it down:

You can check the logs for a specific service like:

docker compose -p lemmy-easy-deploy logs -f SERVICE_NAME

I recommend replacing SERVICE_NAME to narrow down the issue individually by checking these services one by one:

I specifically recommend opening two terminals, running that command in one terminal to start monitoring the lemmy service, then using your second terminal to restart the Lemmy blacked service:

docker compose -p lemmy-easy-deploy restart lemmy

That way you can see any error logs in real time as Lemmy starts up.

If you start getting spammed with /inbox logs, then it suggests that the backend is working fine, as federated servers are able to send you data. So you could do the same procedure with lemmy-ui to see any errors there.

Beyond this, I'm not sure what else to suggest, sadly. But if you do see any unexpected errors, it would definitely be worth reporting on the respective issue trackers.

lordkitsuna commented 1 year ago

sadly i find absolutely no signs of any errors, warnings, much of anything really, on any service. whats the best way to just nuke and start fresh. shut it down and use the standard docker purge or are there other files i should make sure get nuked

squishyoctopus commented 1 year ago

I couldn't find any errors either and it works fine with third party clients like memmy and voyager. It's only the Lemmy UI which doesn't work.

ubergeek77 commented 1 year ago

@lordkitsuna

You can do:

./deploy.sh -s
rm -rf live
docker volume prune -a

Be very careful with that last command, it will delete ALL volumes, including non-Lemmy volumes you might want. Clear volumes manually if you prefer.

ubergeek77 commented 1 year ago

The only other suggestion I have for both of you is that you may have set your system hostname to just lemmy, breaking internal DNS resolution to the backend.

But that wouldn't have changed during an upgrade to 0.18.3, so it's probably not that.

To verify this, you can try running:

docker compose -p lemmy-easy-deploy exec lemmy-ui /bin/sh -c 'wget -O - lemmy:8536/api/v3/site'

If you see no errors, and get a JSON response containing info about your Lemmy instance, then there is nothing wrong with the configuration on my end, and you might want to file a Lemmy UI bug.

But if it does throw an error, we can dig further.

lordkitsuna commented 1 year ago

i am certainly at a loss now, did the full wipe, verified all images gone, re deployed, exact same issue. cant login, ui seems broken. i suppose its a lemmy UI problem at this point but i have no idea why its triggering on a clean install and am not even sure how to report it since no logs show any problems

ubergeek77 commented 1 year ago

Can the UI contact the backend? You can check this with the command I posted above.

There have been similar issues on the Lemmy UI tracker before (and I was impacted by one so I was following it closely).

Most of the time when this happens it's due to local DNS nonsense. The command I posted above can help you confirm if this is the case.

lordkitsuna commented 1 year ago

the command returns properly, spits out a bunch of information about the site without errors. seems all buttons in general are broken on the UI for some reason, even just the show/hide password on the login page for example does nothing.

however looking at chrome dev console i think this may just be a cloudflare issue. image

i do have my cloudflare api key set and have verified its correct so it shouldn't be related to that

ubergeek77 commented 1 year ago

If you're accessing the site at all and you have the Cloudflare proxy, then it's probably fine. Try visiting your_url/api/v3/site in a browser manually and see if Cloudflare blocks you there as well.

If that fails, try temporarily disabling your Cloudflare proxy and set your DNS record to your real server IP (and configure all relevant firewalls). You don't have to remove your Cloudflare API key, that's just for requesting certificates.

If it still doesn't work, and the API is returning in both your browser, and that example command I gave you, then I'm comfortable in saying this is a Lemmy UI bug.

ubergeek77 commented 1 year ago

Also, as a final precaution, scroll down to the bottom of your page and see if your page footer says "FE: 0.18.3."

If it doesn't, then it means your UI from 0.18.2 or lower is possibly being cached, and it may not be compatible with the 0.18.3 backend.

In that case, you could try force clearing your browser cache or trying a different browser.

I can tell you my UI doesn't throw those errors, so I suspect this is the case for you.

lordkitsuna commented 1 year ago

its a cloudflare issue, disabled the proxy purged dns cache and reloaded page. works fine now. so something is broken with the rocket loader js minifyer for lemmy atm. at which point i have something to sink a report into

ubergeek77 commented 1 year ago

Interesting...

Do you have any idea how I might intentionally "break" my own instance to reproduce this? I also use the Cloudflare proxy, but I've never had anything like this happen before.

Do you have some custom page rules maybe?

lordkitsuna commented 1 year ago

no custom rules, i do however have the "rocket loader" enabled under the speed/optimization tab area and it was the rocker-loader that was throwing the error so i assume its related to that. or might be auto minify as it does warn some newer js features might break i pretty much have any and all optimizations they give free accounts enabled. i will have to start turning them off and see which one is doing it

ubergeek77 commented 1 year ago

I see

@squishyoctopus Your debug output shows you using Cloudflare, are you using those features as well?

squishyoctopus commented 1 year ago

Turning off cloudflare proxy fixed it for me too. First I tried turning developer mode on which is supposed to bypass cloudflare cache but it didn't fix it until I turned off proxying and cleared my DNS cache to load from the server directly.

ubergeek77 commented 1 year ago

Well, glad to hear it's not a Lemmy issue!

I do use Cloudflare myself, so I can tell you the Cloudflare Proxy, by default, doesn't typically mess with me, and 0.18.3 is working fine.

Definitely check to see if you have rocket-loader or any JS minimization features enabled by default. If you use your root domain for other things, you might have them enabled for your entire Zone without even realizing it.

Since mine is working, I'm willing to bet there's something you can change on the Cloudflare side to get it working again, without needing developer mode and without disabling the cache (I don't do either of those things).