owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.37k stars 180 forks source link

Services listening on IPv4 lo interfaces only (was: IDP logon network error on fresh install) #7025

Closed allan2 closed 1 year ago

allan2 commented 1 year ago

Describe the bug

I have a fresh bare metal deployment on 3.1.0-rc.1 arm64. I'm unable to login using the default admin account.

ocis

Web portal response:

500 Internal Server Error - failed to logon

Server log:

ocis[86700]: {"level":"error","service":"idp","error":"ldap identifier backend logon connect error: LDAP Result Code 200 \"Network Error\": dial tcp [::1]:9235: connect: connection refused","time":"2023-08-13T12:34:56.123456789Z","message":"identifier failed to logon with backend"}

Steps to reproduce

My setup is identical to the bare metal deployment (systemd + Nginx without the SSL hardening definitions). During init, I configured oCIS with certificate checking disabled (answered yes).

/etc/ocis/ocis.env

OCIS_URL=https://ocis.mydomain.com
PROXY_HTTP_ADDR=0.0.0.0:9200
PROXY_TLS=false
OCIS_INSECURE=false

OCIS_LOG_LEVEL=warn

OCIS_CONFIG_DIR=/etc/ocis
OCIS_BASE_DATA_PATH=/var/lib/ocis

More info

Ports should be open.

ufw status

Status: active

To                         Action      From
--                         ------      ----
...others...
9235/tcp                   ALLOW       Anywhere
9235/tcp (v6)              ALLOW       Anywhere (v6)

I'm using the admin password generated during ocis init. I've verified that it's the same as admin_password under idm in /etc/ocis/ocis.yaml.

\\

Any help would be greatly appreciated!

rhafer commented 1 year ago

Server log:

ocis[86700]: {"level":"error","service":"idp","error":"ldap identifier backend logon connect error: LDAP Result Code 200 \"Network Error\": dial tcp [::1]:9235: connect: connection refused","time":"2023-08-13T12:34:56.123456789Z","message":"identifier failed to logon with backend"}

Hm, connection refuse usually means that nothing is listening on that port or that there is some reject rule in place on the firewall.

I am not familiar with ufw, but might there be other rules rejecting traffic to port 9235?

Can you check whether the ocis process is actually listenting on port 9235 using (ss or netstat) ? To rule out that the idm service not starting properly. Startup logs with OCIS_LOG_LEVEL set to debug might also help here.

allan2 commented 1 year ago

netstat -tuln | grep 9235

tcp        0      0 127.0.0.1:9235          0.0.0.0:*               LISTEN

I didn't see anything further related to this issue when the log level is set to debug.

ldapsearch -x -H ldaps://127.0.0.1:9235 -x \ -D uid=libregraph,ou=sysusers,o=libregraph-idm \ -W -b o=libregraph-idm objectclass=inetorgperson

ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

LDAPTLS_REQCERT=never \ ldapsearch -x -H ldaps://127.0.0.1:9235 -x \ -D uid=libregraph,ou=sysusers,o=libregraph-idm \ -W -b o=libregraph-idm objectclass=inetorgperson

After inputting idm_password from /etc/ocis/ocis.yaml:

ldap_bind: Invalid credentials (49)

Is it possible that the included IDM service isn't starting properly?

rhafer commented 1 year ago

LDAPTLS_REQCERT=never \ ldapsearch -x -H ldaps://127.0.0.1:9235 -x \ -D uid=libregraph,ou=sysusers,o=libregraph-idm \ -W -b o=libregraph-idm objectclass=inetorgperson

After inputting idm_password from /etc/ocis/ocis.yaml:

ldap_bind: Invalid credentials (49)

That means that the IDM service is at least running. ldap_bind: Invalid credentials (49) means that the authentication failed. Either the password is wrong or something went wrong when initializing the IDM database.

As this is still a fresh install you could try to re-initialize the IDM database. You can do this by stopping ocis, removing the idm/ocis.boltdb file from the ocis data directory (/var/lb/ocis/ or ~/.ocis) and than starting ocis again.

If it still does not work after that. Please attach the full log with OCIS_LOG_LEVEL=debug here.

allan2 commented 1 year ago

Log attached.

This is what I did:

Clean:

systemctl stop ocis
sudo rm -rf /var/lib/ocis/*
sudo rm /etc/ocis/ocis.yaml

Initialize (answer yes for config with certificate checking disabled):

sudo -u ocis ocis init --config-path /etc/ocis

Write log. This includes a single login request through the web portal.

start_time=$(date --date="-5 seconds" '+%Y-%m-%d %H:%M:%S') && systemctl start ocis && sleep 60 && sudo journalctl -u ocis --since="$start_time" > /home/al/ocis_log.txt

ocis_fresh_log.txt

rhafer commented 1 year ago
Aug 16 04:51:36 myhostname ocis[113236]: {"level":"info","service":"idm","time":"2023-08-16T04:51:36.734675714Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/crypto/gencert.go:90","message":"written certificate to /var/lib/ocis/idm/ldap.crt"}
Aug 16 04:51:36 myhostname ocis[113236]: {"level":"info","service":"idm","time":"2023-08-16T04:51:36.735468718Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/crypto/gencert.go:127","message":"written key to /var/lib/ocis/idm/ldap.key"}
Aug 16 04:51:36 myhostname ocis[113236]: {"level":"debug","service":"idm","time":"2023-08-16T04:51:36.736016473Z","line":"github.com/owncloud/ocis/v2/services/idm/pkg/command/server.go:73","message":"Bootstrapping IDM database"}
Aug 16 04:51:36 myhostname ocis[113236]: {"level":"debug","service":"idm","db":"/var/lib/ocis/idm/ocis.boltdb","time":"2023-08-16T04:51:36.736708809Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/log/logrus_wrapper.go:50","message":"Open boltdb"}
Aug 16 04:51:36 myhostname ocis[113236]: {"level":"debug","service":"idm","db":"/var/lib/ocis/idm/ocis.boltdb","time":"2023-08-16T04:51:36.740057565Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/log/logrus_wrapper.go:50","message":"Adding default buckets"}
Aug 16 04:51:36 myhostname ocis[113236]: {"level":"debug","service":"nats","time":"2023-08-16T04:51:36.882131416Z","line":"github.com/owncloud/ocis/v2/services/nats/pkg/logging/nats.go:45","message":"127.0.0.1:58868 - cid:14 - Client connection created"}
Aug 16 04:51:36 myhostname ocis[113236]: {"level":"debug","service":"nats","time":"2023-08-16T04:51:36.919894265Z","line":"github.com/owncloud/ocis/v2/services/nats/pkg/logging/nats.go:45","message":"127.0.0.1:58878 - cid:15 - Client connection created"}
Aug 16 04:51:37 myhostname systemd[1]: ocis.service: A process of this unit has been killed by the OOM killer.
Aug 16 04:51:37 myhostname systemd[1]: ocis.service: Main process exited, code=killed, status=9/KILL
Aug 16 04:51:37 myhostname systemd[1]: ocis.service: Failed with result 'oom-kill'.
Aug 16 04:51:37 myhostname systemd[1]: ocis.service: Consumed 1.175s CPU time.
Aug 16 04:51:37 myhostname systemd[1]: ocis.service: Scheduled restart job, restart counter is at 1.
Aug 16 04:51:37 myhostname systemd[1]: Stopped OCIS server.

Hm, ocis was killed by the OOM killer upon first startup of the idm service. This might have caused followup issues. (I.e. the initialization didn't complete).

I am not 100% sure that his really is causing your problem, but we should get that out of the way first. How much RAM does that system have? It might make sense to play with the GOGC or GOMEMLIMIT setting to limit the golang runtime's memory usage.

allan2 commented 1 year ago

I had 512 MB RAM previously. I bumped to 4 GB and did a new fresh install. The log is attached.

ocis_log_0817.txt

The OOM killer message is now gone. However, the error still persists.

Lines 534–535:

Aug 17 06:33:04 myhostname ocis[2099]: {"level":"debug","service":"proxy","policy":"ocis","method":"POST","prefix":"/signin/","path":"/signin/v1/identifier/_/logon","routeType":"prefix","time":"2023-08-17T06:33:04.206185735Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/router/router.go:222","message":"director found"}
Aug 17 06:33:04 myhostname ocis[2099]: {"level":"error","service":"idp","error":"ldap identifier backend logon connect error: LDAP Result Code 200 \"Network Error\": dial tcp [::1]:9235: connect: connection refused","time":"2023-08-17T06:33:04.206740666Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/log/logrus_wrapper.go:50","message":"identifier failed to logon with backend"}
Aug 17 06:33:04 myhostname ocis[2099]: {"level":"debug","service":"idp","request-id":"","proto":"HTTP/1.1","method":"POST","status":500,"path":"/signin/v1/identifier/_/logon","duration":0.359714,"bytes":44,"time":"2023-08-17T06:33:04.206800416Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/middleware/logger.go:27"}
rhafer commented 1 year ago

Aug 17 06:33:04 myhostname ocis[2099]: {"level":"error","service":"idp","error":"ldap identifier backend logon connect error: LDAP Result Code 200 \"Network Error\": dial tcp [::1]:9235: connect: connection refused","time":"2023-08-17T06:33:04.206740666Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/log/logrus_wrapper.go:50","message":"identifier failed to logon with backend"}

Argh, now I understand what's going wrong here. The idp service is trying to contact idm on the IPv6 address, but idm is only listening on the IPv4 address. :astonished:

As a workaround, could you please try to set OCIS_LDAP_URI=ldaps://127.0.0.1:9235 to force it to use the IPv4 address for now.

It would still be interesting to know why your system behaves that way (while other's don't). Could you share a bit more infor about the system? What distribution are you running?

allan2 commented 1 year ago

Thanks for the help! Login and sync is working now. I had to set two additional env vars.

Added to ocis.env:

# necessary for login
OCIS_LDAP_URI=ldaps://127.0.0.1:9235
# necessary for login (error initializing metadata client)
STORAGE_SYSTEM_DATA_SERVER_URL=http://127.0.0.1:9216/data
# necessary for sync (error doing GET request to data service, error doing PATCH request to data service)
STORAGE_USERS_DATA_SERVER_URL=http://127.0.0.1:9158/data

This instance is running Ubuntu 22.04.2 LTS. /etc/gai.conf is unmodified (fully commented, default values give IPv6 higher precedence than IPv4).

There could be other services still erroring because they are trying to connect to [::1]:PORT. Any idea why this is happening?

rhafer commented 1 year ago

Thanks for the help! Login and sync is working now. I had to set two additional env vars.

Glad that we found the root cause.

Added to ocis.env:

# necessary for login
OCIS_LDAP_URI=ldaps://127.0.0.1:9235
# necessary for login (error initializing metadata client)
STORAGE_SYSTEM_DATA_SERVER_URL=http://127.0.0.1:9216/data
# necessary for sync (error doing GET request to data service, error doing PATCH request to data service)
STORAGE_USERS_DATA_SERVER_URL=http://127.0.0.1:9158/data

This instance is running Ubuntu 22.04.2 LTS. /etc/gai.conf is unmodified (fully commented, default values give IPv6 higher precedence than IPv4).

There could be other services still erroring because they are trying to connect to [::1]:PORT. Any idea why this is happening?

No, not yet. I was hoping you had some local modifications to you gai.conf :smile: . But I guess it would already help if we'd teach ocis to list on all (v4/v6) loopback interfaces. Still need to dig a bit deeper into this.

rhafer commented 1 year ago

@allan2 Could you please attach the /etc/host file of your system? AFAICS Ubuntu does not map the localhost name to the ::1 IPv6 address by default.

allan2 commented 1 year ago

/etc/hosts

#127.0.0.1 localhost
::1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

so my system was mapping IPv6 loopack to localhost but not IPv4. I apologize for the trouble; I must have changed it when troubleshooting a separate issue a while back and forgotten about it.

After toggling the comments:

127.0.0.1 localhost
#::1 localhost

oCIS now working as expected without explicitly setting OCIS_LDAP_URI and the other vars. Thanks again!

rhafer commented 1 year ago

so my system was mapping IPv6 loopack to localhost but not IPv4.

Ok, that explains a lot. I was already questioning my sanity :laughing:

Meanwhile, after reading through some golang documentation and issues, I believe what we're doing (explicitly listening on 127.0.0.1 instead of localhost) is mostly. Golang's net.Listen() will anyway only listen on the first IPv4 address returned after resolving the hostname.

What we don't support yet is configuring a list of IP addresses (IPv4 and IPv6) to listen on. (i.e. for a dual-stack setup). Though I am not sure if that is really needed. The proxy service, which is the only one that actually needs to be acccessible from external already listens on 0.0.0.0 (INADDR_ANY) so that should already work in a dual-stack scenario IIUC.