owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.36k stars 178 forks source link

Logins (web/apps) stopped working, webdav still works #6612

Closed flamingm0e closed 1 year ago

flamingm0e commented 1 year ago

Describe the bug

I don't really know what happened. Everything was working fine after my previous issue with upgrading from rc3 to 3.0.0 got resolved.

One day the desktop app stopped connecting, I cannot login on browser, and I cannot login on Android app.

What shows up in the logs when I try to login via web browser:

{"level":"error","service":"proxy","error":"failed to verify access token: Get \"https://myurl.com/.well-known/openid-configuration\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","authenticator":"oidc","path":"/ocs/v1.php/cloud/user","time":"2023-06-23T20:04:48.421013683Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/oidc_auth.go:162","message":"failed to authenticate the request"}

When I go to that URL with .well-known/openid-configuration I get 404.

Setup

V3.0.0 running in Docker, with Caddy2 as reverse proxy Extremely minimal configuration, some very basic ENVVARS: - OCIS_INSECURE=true - OCIS_LOG_LEVEL=info - OCIS_URL=https://myactualdomainurl.com - PROXY_ENABLE_BASIC_AUTH=true <-- This is what I wanted to get rid of this weekend, by configuring some rclone oidc agents - PROXY_TLS=false

Additional context

I'm not sure what else I can add to this. Willing to dig more if someone can point me on the path. I've been through so many open and closed issues trying to figure it out, I'm really struggling here to make sense of it all. I think what confuses me the most is I am used to separate services and being able to troubleshoot them individually, but when everything is in one binary I feel lost. Not to mention the multitude of configuration options and settings.

But to add to the confusion I have, RCLONE/webDAV mounts are working fine.

micbar commented 1 year ago

Seems like something has changed in your caddy setup.

Please make sure that the ocis service can reach himself via the public domain. You can try a curl to this URL https://myurl.com/.well-known/openid-configuration from the server where ocis is running. This is needed for the OpenIDConnect flow.

This also explains why WebDAV via basic auth is still working.

flamingm0e commented 1 year ago

Nothing in Caddy changed. I feel like it had something to do with a cookie/token expiring.

I have tested the ability to resolve the URL, but didn't think to curl.

Inside the container, it can reach itself.

The host itself can reach it as well.

Tested from my browser just now as well, and it can connect and JSON is presented now, not sure why it was 404 earlier.

micbar commented 1 year ago

Maybe a duplicate of https://github.com/owncloud/ocis/issues/6557 ?

flamingm0e commented 1 year ago

I was reading that one earlier, but even deleting cookies in my browser doesn't fix it. I can get to the login page, but upon login, it finally times out, doesn't login, and I get login failed.

flamingm0e commented 1 year ago

And of course after a week of futzing with it, the Android app logs in, the web browser lets me login, and everything seems almost normal.

HOWEVER, the Android app keeps telling me that my credentials have expired and I need to login again.

The Windows desktop app tries to use the browser to login (same browser I'm logged in with already), after selecting my username, and hitting allow I get:

Login Error Failed to retrieve user info

Hitting the retry button lets it login successfully.

Something is REALLY buggy in my server, and I don't know where to begin to troubleshoot it. Should I just pave it and start over? Restore my 400gb of data across all users?

flamingm0e commented 1 year ago

And with zero changes made, it's all broken again.

micbar commented 1 year ago

@flamingm0e

It seems your setup is very flaky. I would suggest not to delete the setup. The data should be consistent and intact. For me it looks more like a heavy infrastructure problem. Without knowing your setup, I can only guess. It could be a second process running on the same port, a broken proxy or a hardware resource problem (disk full, network outage, CPU limits). BTW how did you mount your primary storage? Make sure that you have no NFS with caching issues or something like that.

Logs would help to understand your problem.

flamingm0e commented 1 year ago

I actually came to the same conclusion last night after it broke the last time.

It was frustrating, so I spent a stupid amount of time going through my network.

I found one of my DNS servers wasn't replying to anything, and restarted it.

It's been running rock solid for the last 8 hours so far.

I'm going to be mad if it was DNS, because I know "it's always DNS"!

I apologize for wasting everyone's time with this. I will hold off on closing my issue for a few more hours until I can safely say the problem is gone.

flamingm0e commented 1 year ago

Closing this.

It was DNS. I'll add that to my troubleshooting steps.