mother-of-all-self-hosting / mash-playbook

🐋 Ansible playbook which helps you host various FOSS services as Docker containers on your own server
GNU Affero General Public License v3.0
466 stars 60 forks source link

mash-authentik-server.service was not detected to be running #117

Open ghost opened 11 months ago

ghost commented 11 months ago
TASK [galaxy/com.devture.ansible.role.systemd_service_manager : Fail if service isn't detected to be running] **************
failed: [workspace.di.xyz] (item=mash-authentik-server.service) => {"ansible_loop_var": "item", "changed": false, "item": "mash-authentik-server.service", "msg": "mash-authentik-server.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status mash-authentik-server.service` and `journalctl -fu mash-authentik-server.service` on the server to investigate. If you're on a slow or overloaded server, it may be that services take a longer time to start and that this error is a false-positive. You can consider raising the value of the `devture_systemd_service_manager_up_verification_delay_seconds` variable. See `/home/ubuntu/mash-playbook/roles/galaxy/com.devture.ansible.role.systemd_service_manager/defaults/main.yml` for more details about that."}
failed: [workspace.di.xyz] (item=mash-authentik-worker.service) => {"ansible_loop_var": "item", "changed": false, "item": "mash-authentik-worker.service", "msg": "mash-authentik-worker.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status mash-authentik-worker.service` and `journalctl -fu mash-authentik-worker.service` on the server to investigate. If you're on a slow or overloaded server, it may be that services take a longer time to start and that this error is a false-positive. You can consider raising the value of the `devture_systemd_service_manager_up_verification_delay_seconds` variable. See `/home/ubuntu/mash-playbook/roles/galaxy/com.devture.ansible.role.systemd_service_manager/defaults/main.yml` for more details about that."}

systemctl status mash-authentik-server.service

● mash-authentik-server.service - Authentik Server (mash-authentik-server)
     Loaded: loaded (/etc/systemd/system/mash-authentik-server.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Fri 2023-11-10 19:04:50 EET; 27s ago
    Process: 4060307 ExecStartPre=/usr/bin/env sh -c /usr/bin/env docker kill mash-authentik-server 2>/dev/null || true (code=exited, status=0/SUCCESS)
    Process: 4060315 ExecStartPre=/usr/bin/env sh -c /usr/bin/env docker rm mash-authentik-server 2>/dev/null || true (code=exited, status=0/SUCCESS)
    Process: 4060323 ExecStartPre=/usr/bin/env docker create --rm --name=mash-authentik-server --log-driver=none --user=997:1003 --cap-drop=ALL --read-only --network=mash-authentik --env-file=>
    Process: 4060329 ExecStartPre=/usr/bin/env docker network connect traefik mash-authentik-server (code=exited, status=0/SUCCESS)
    Process: 4060336 ExecStartPre=/usr/bin/env docker network connect mash-postgres mash-authentik-server (code=exited, status=0/SUCCESS)
    Process: 4060343 ExecStartPre=/usr/bin/env docker network connect mash-authentik-redis mash-authentik-server (code=exited, status=0/SUCCESS)
    Process: 4060349 ExecStart=/usr/bin/env docker start --attach mash-authentik-server (code=exited, status=1/FAILURE)
   Main PID: 4060349 (code=exited, status=1/FAILURE)
        CPU: 145ms
moan0s commented 11 months ago

Can you add the output of journalctl -fu mash-authentik-server.service?

daniel-rikowski commented 7 months ago

Not OP, but I believe I have the same problem. This is what journalctl -fu mash-authentik-server.service says (repeatedly, also truncated leading timestamps and hostname)

2024-03-14 22:17:13 [info     ] waiting to acquire database lock
2024-03-14 22:17:13 [info     ] Migration needs to be applied  migration=tenant_to_brand.py
Traceback (most recent call last):
  File "/lifecycle/migrate.py", line 98, in <module>
    migration.run()
  File "/lifecycle/system_migrations/tenant_to_brand.py", line 25, in run
    self.cur.execute(SQL_STATEMENT)
  File "/ak-root/venv/lib/python3.12/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
psycopg.errors.UndefinedTable: relation "authentik_tenants_tenant" does not exist
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/lifecycle/migrate.py", line 118, in <module>
    release_lock(curr)
  File "/lifecycle/migrate.py", line 67, in release_lock
    cursor.execute("SELECT pg_advisory_unlock(%s)", (ADV_LOCK_UID,))
  File "/ak-root/venv/lib/python3.12/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
psycopg.errors.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction

Directly followed by:

systemd[1]: mash-authentik-server.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: mash-authentik-server.service: Failed with result 'exit-code'.
systemd[1]: mash-authentik-server.service: Scheduled restart job, restart counter is at 21.
systemd[1]: Stopped Authentik Server (mash-authentik-server).
systemd[1]: Starting Authentik Server (mash-authentik-server)...

And this from PostgreSQL:

2024-03-14 22:16:38.149 UTC [88] ERROR:  relation "authentik_tenants_tenant" does not exist
2024-03-14 22:16:38.149 UTC [88] STATEMENT:
        BEGIN TRANSACTION;
        ALTER TABLE authentik_tenants_tenant RENAME TO authentik_brands_brand;
        UPDATE django_migrations SET app = replace(app, 'authentik_tenants', 'authentik_brands');
        UPDATE django_content_type SET app_label = replace(app_label, 'authentik_tenants', 'authentik_brands');
        COMMIT;

2024-03-14 22:16:38.150 UTC [88] ERROR:  current transaction is aborted, commands ignored until end of transaction block

It looks like it is this problem: https://github.com/goauthentik/authentik/issues/8863

But different from what https://github.com/goauthentik/authentik/issues/8863#issuecomment-1996715847 suggests, when installing using mash-playbook, authentik never seems to recover from that error. The process seems to just terminate and never making any progress, not even after repeated restarts.

daniel-rikowski commented 7 months ago

I did some digging in the Authentik source code and it looks like there's a rather brittle database migration in https://github.com/goauthentik/authentik/blob/main/lifecycle/system_migrations/tenant_to_brand.py

It assumes: "If there's any previous migration run, but no migration for authentik_brands, then there must exist a table named authentik_tenants_tenant"

It seems that the playbook somehow created a partial database, where this assumption is wrong.

To fix it, I entered the mash-postgres container and just dropped the Authentik database.

After that just setup-all resulted in a working Authentik instance.