sissbruecker / linkding

Self-hosted bookmark manager that is designed be to be minimal, fast, and easy to set up using Docker.
MIT License
5.33k stars 261 forks source link

Container `/health` check is failing, connection to sqlite file on NFS mount to blame #679

Closed dalanmiller closed 2 months ago

dalanmiller commented 3 months ago

Running linkding in a docker container and the health check is failing where I expected it to be working fine. The migrations seem to be running fine which indicates the connection to the sqlite database is fine.

Yet, when I looked at the logic for /health that seems to be the only check that is happening.

https://github.com/sissbruecker/linkding/blob/bb6c5ca29e3b66a70c2ff1751ea6183c7011d4ae/bookmarks/views/health.py#L12

On the docker host I can see also that the sqlite file exists and was last modified just a moment ago when I spun up the container.

drwxrwxrwx  2 www-data www-data 4.0K Apr  3 09:48 assets
-rwxrwxrwx  1 www-data www-data 308K Apr  3 09:48 db.sqlite3
drwxrwxrwx  2 www-data www-data 4.0K Apr  3 09:48 favicons
-rwxrwxrwx  1 www-data www-data   50 Apr  3 09:48 secretkey.txt

Confirming that this is accurate when I exec linkding /bin/sh:

# curl localhost:9090/health
{"version": "1.27.0", "status": "unhealthy"}#

Logs

linkding  | 2024-04-03 09:48:17,608 INFO Generated secret key file
linkding  | Operations to perform:
linkding  |   Apply all migrations: admin, auth, authtoken, background_task, bookmarks, contenttypes, sessions
linkding  | Running migrations:
linkding  |   Applying contenttypes.0001_initial... OK
linkding  |   Applying auth.0001_initial... OK
linkding  |   Applying admin.0001_initial... OK
linkding  |   Applying admin.0002_logentry_remove_auto_add... OK
linkding  |   Applying admin.0003_logentry_add_action_flag_choices... OK
linkding  |   Applying contenttypes.0002_remove_content_type_name... OK
linkding  |   Applying auth.0002_alter_permission_name_max_length... OK
linkding  |   Applying auth.0003_alter_user_email_max_length... OK
linkding  |   Applying auth.0004_alter_user_username_opts... OK
linkding  |   Applying auth.0005_alter_user_last_login_null... OK
linkding  |   Applying auth.0006_require_contenttypes_0002... OK
linkding  |   Applying auth.0007_alter_validators_add_error_messages... OK
linkding  |   Applying auth.0008_alter_user_username_max_length... OK
linkding  |   Applying auth.0009_alter_user_last_name_max_length... OK
linkding  |   Applying auth.0010_alter_group_name_max_length... OK
linkding  |   Applying auth.0011_update_proxy_permissions... OK
linkding  |   Applying auth.0012_alter_user_first_name_max_length... OK
linkding  |   Applying authtoken.0001_initial... OK
linkding  |   Applying authtoken.0002_auto_20160226_1747... OK
linkding  |   Applying authtoken.0003_tokenproxy... OK
linkding  |   Applying background_task.0001_initial... OK
linkding  |   Applying background_task.0002_auto_20170927_1109... OK
linkding  |   Applying background_task.0003_auto_20210410_1529... OK
linkding  |   Applying background_task.0004_auto_20220202_1721... OK
linkding  |   Applying bookmarks.0001_initial... OK
linkding  |   Applying bookmarks.0002_auto_20190629_2303... OK
linkding  |   Applying bookmarks.0003_auto_20200913_0656... OK
linkding  |   Applying bookmarks.0004_auto_20200926_1028... OK
linkding  |   Applying bookmarks.0005_auto_20210103_1212... OK
linkding  |   Applying bookmarks.0006_bookmark_is_archived... OK
linkding  |   Applying bookmarks.0007_userprofile... OK
linkding  |   Applying bookmarks.0008_userprofile_bookmark_date_display... OK
linkding  |   Applying bookmarks.0009_bookmark_web_archive_snapshot_url... OK
linkding  |   Applying bookmarks.0010_userprofile_bookmark_link_target... OK
linkding  |   Applying bookmarks.0011_userprofile_web_archive_integration... OK
linkding  |   Applying bookmarks.0012_toast... OK
linkding  |   Applying bookmarks.0013_web_archive_optin_toast... OK
linkding  |   Applying bookmarks.0014_alter_bookmark_unread... OK
linkding  |   Applying bookmarks.0015_feedtoken... OK
linkding  |   Applying bookmarks.0016_bookmark_shared... OK
linkding  |   Applying bookmarks.0017_userprofile_enable_sharing... OK
linkding  |   Applying bookmarks.0018_bookmark_favicon_file... OK
linkding  |   Applying bookmarks.0019_userprofile_enable_favicons... OK
linkding  |   Applying bookmarks.0020_userprofile_tag_search... OK
linkding  |   Applying bookmarks.0021_userprofile_display_url... OK
linkding  |   Applying bookmarks.0022_bookmark_notes... OK
linkding  |   Applying bookmarks.0023_userprofile_permanent_notes... OK
linkding  |   Applying bookmarks.0024_userprofile_enable_public_sharing... OK
linkding  |   Applying bookmarks.0025_userprofile_search_preferences... OK
linkding  |   Applying bookmarks.0026_userprofile_custom_css... OK
linkding  |   Applying bookmarks.0027_userprofile_bookmark_description_display_and_more... OK
linkding  |   Applying bookmarks.0028_userprofile_display_archive_bookmark_action_and_more... OK
linkding  |   Applying bookmarks.0029_bookmark_list_actions_toast... OK
linkding  |   Applying bookmarks.0030_bookmarkasset... OK
linkding  |   Applying bookmarks.0031_userprofile_enable_automatic_html_snapshots... OK
linkding  |   Applying bookmarks.0032_html_snapshots_hint_toast... OK
linkding  |   Applying sessions.0001_initial... OK
linkding  | 2024-04-03 09:48:20,488 INFO Current journal mode: delete
linkding  | 2024-04-03 09:48:20,491 INFO Switched to WAL journal mode
linkding  | 2024-04-03 09:48:21,212 INFO Created initial superuser
linkding  | [uWSGI] getting INI configuration from uwsgi.ini
linkding  | [uwsgi-static] added mapping for /static => static
linkding  | [uwsgi-static] added mapping for /static => data/favicons
linkding  | *** Starting uWSGI 2.0.23 (64bit) on [Wed Apr  3 09:48:21 2024] ***
linkding  | compiled with version: 12.2.0 on 18 March 2024 22:01:04
linkding  | os: Linux-5.15.0-100-generic #110-Ubuntu SMP Wed Feb 7 13:27:48 UTC 2024
linkding  | nodename: ddca672ad099
linkding  | machine: x86_64
linkding  | clock source: unix
linkding  | detected number of CPU cores: 4
linkding  | current working directory: /etc/linkding
linkding  | writing pidfile to /tmp/linkding.pid
linkding  | detected binary path: /opt/venv/bin/uwsgi
linkding  | !!! no internal routing support, rebuild with pcre support !!!
linkding  | setgid() to 33
linkding  | setuid() to 33
linkding  | your memory page size is 4096 bytes
linkding  | detected max file descriptor number: 1048576
linkding  | building mime-types dictionary from file /etc/mime.types...1545 entry found
linkding  | lock engine: pthread robust mutexes
linkding  | thunder lock: disabled (you can enable it with --thunder-lock)
linkding  | uWSGI http bound on :9090 fd 4
linkding  | uwsgi socket 0 bound to TCP address 127.0.0.1:44243 (port auto-assigned) fd 3
linkding  | Python version: 3.11.8 (main, Mar 12 2024, 11:52:02) [GCC 12.2.0]
linkding  | Python main interpreter initialized at 0x7fc279331218
linkding  | python threads support enabled
linkding  | your server socket listen backlog is limited to 100 connections
linkding  | your mercy for graceful operations on workers is 60 seconds
linkding  | mapped 274704 bytes (268 KB) for 4 cores
linkding  | *** Operational MODE: preforking+threaded ***
linkding  | WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x7fc279331218 pid: 1 (default app)
linkding  | *** uWSGI is running in multiple interpreter mode ***
linkding  | spawned uWSGI master process (pid: 1)
linkding  | spawned uWSGI worker 1 (pid: 17, cores: 2)
linkding  | spawned uWSGI worker 2 (pid: 19, cores: 2)
linkding  | *** Stats server enabled on 127.0.0.1:9191 fd: 14 ***
linkding  | spawned uWSGI http 1 (pid: 21)
linkding  | 2024-04-03 09:48:46,515 ERROR Internal Server Error: /health
linkding  | [pid: 17|app: 0|req: 1/1] 127.0.0.1 () {28 vars in 305 bytes} [Wed Apr  3 09:48:46 2024] GET /health => generated 44 bytes in 129 msecs (HTTP/1.1 500) 8 headers in 270 bytes (1 switches on core 0)
linkding  | 2024-04-03 09:49:16,612 ERROR Internal Server Error: /health
linkding  | [pid: 17|app: 0|req: 2/2] 127.0.0.1 () {28 vars in 305 bytes} [Wed Apr  3 09:49:16 2024] GET /health => generated 44 bytes in 14 msecs (HTTP/1.1 500) 8 headers in 270 bytes (1 switches on core 1)
linkding  | 2024-04-03 09:49:46,807 ERROR Internal Server Error: /health
linkding  | [pid: 19|app: 0|req: 1/3] 127.0.0.1 () {28 vars in 305 bytes} [Wed Apr  3 09:49:46 2024] GET /health => generated 44 bytes in 117 msecs (HTTP/1.1 500) 8 headers in 270 bytes (1 switches on core 0)
dalanmiller commented 3 months ago

https://github.com/sissbruecker/linkding/issues/230#issuecomment-1085182826

I think this is likely the reason. I am mounting an NFS mount into the container and despite the sqlite file being accessible and writeable, there's something maybe returning incorrectly from the .ensure_connections method in Django.

There should be a way to disable the /health endpoint in the docker container.

dalanmiller commented 3 months ago

Okay I can get the healthcheck to pass by adding this to my docker-compose.yml file:

linkding:
  ...
  healthcheck:
    test: ['CMD', "curl", "-f", "http://localhost:9090"]

However, (and maybe as expected) Traefik then picks up the routing given the container is now in a 'healthy' state, but there are 500 errors abound:

linkding  | 2024-04-03 10:35:26,422 ERROR Internal Server Error: /login/
linkding  | Traceback (most recent call last):
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 275, in ensure_connection
linkding  |     self.connect()
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
linkding  |     return func(*args, **kwargs)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 256, in connect
linkding  |     self.connection = self.get_new_connection(conn_params)
linkding  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
linkding  |     return func(*args, **kwargs)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/backends/sqlite3/base.py", line 181, in get_new_connection
linkding  |     conn = Database.connect(**conn_params)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
linkding  | sqlite3.OperationalError: unable to open database file
linkding  |
linkding  | The above exception was the direct cause of the following exception:
linkding  |
linkding  | Traceback (most recent call last):
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
linkding  |     response = get_response(request)
linkding  |                ^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/base.py", line 220, in _get_response
linkding  |     response = response.render()
linkding  |                ^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/template/response.py", line 114, in render
linkding  |     self.content = self.rendered_content
linkding  |                    ^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/template/response.py", line 92, in rendered_content
linkding  |     return template.render(context, self._request)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/template/backends/django.py", line 61, in render
linkding  |     return self.template.render(context)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/template/base.py", line 169, in render
linkding  |     with context.bind_template(self):
linkding  |   File "/usr/local/lib/python3.11/contextlib.py", line 137, in __enter__
linkding  |     return next(self.gen)
linkding  |            ^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/template/context.py", line 254, in bind_template
linkding  |     context = processor(self.request)
linkding  |               ^^^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/etc/linkding/bookmarks/context_processors.py", line 27, in public_shares
linkding  |     has_public_shares = query_set.count() > 0
linkding  |                         ^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/models/query.py", line 620, in count
linkding  |     return self.query.get_count(using=self.db)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/models/sql/query.py", line 629, in get_count
linkding  |     return obj.get_aggregation(using, {"__count": Count("*")})["__count"]
linkding  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/models/sql/query.py", line 615, in get_aggregation
linkding  |     result = compiler.execute_sql(SINGLE)
linkding  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
linkding  |     cursor = self.connection.cursor()
linkding  |              ^^^^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
linkding  |     return func(*args, **kwargs)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 316, in cursor
linkding  |     return self._cursor()
linkding  |            ^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 292, in _cursor
linkding  |     self.ensure_connection()
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
linkding  |     return func(*args, **kwargs)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 274, in ensure_connection
linkding  |     with self.wrap_database_errors:
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
linkding  |     raise dj_exc_value.with_traceback(traceback) from exc_value
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 275, in ensure_connection
linkding  |     self.connect()
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
linkding  |     return func(*args, **kwargs)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 256, in connect
linkding  |     self.connection = self.get_new_connection(conn_params)
linkding  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
linkding  |     return func(*args, **kwargs)
linkding  |            ^^^^^^^^^^^^^^^^^^^^^
linkding  |   File "/opt/venv/lib/python3.11/site-packages/django/db/backends/sqlite3/base.py", line 181, in get_new_connection
linkding  |     conn = Database.connect(**conn_params)
dalanmiller commented 3 months ago

Ahhh this is probably why

Sonarr currently uses WAL mode for journals with SQLite. WAL mode has some advantages, but one major disadvantage is that it can not safely be used over non-local filesystems (https://sqlite.org/wal.html); docker for windows and other virtualization systems using CIFS mounted host paths often fail with sqlite locking or corruption errors when using WAL with sqlite file on host shared paths.

And given in the logs

linkding | 2024-04-03 09:48:20,491 INFO Switched to WAL journal mode

Need an option to change this journal mode if need be.

sissbruecker commented 3 months ago

It seems using SQLite over a network is discouraged in general.

https://www.sqlite.org/whentouse.html:

If there are many client programs sending SQL to the same database over a network, then use a client/server database engine instead of SQLite. SQLite will work over a network filesystem, but because of the latency associated with most network filesystems, performance will not be great. Also, file locking logic is buggy in many network filesystem implementations (on both Unix and Windows). If file locking does not work correctly, two or more clients might try to modify the same part of the same database at the same time, resulting in corruption. Because this problem results from bugs in the underlying filesystem implementation, there is nothing SQLite can do to prevent it.

https://sqlite.org/lockingv3.html:

SQLite uses POSIX advisory locks to implement locking on Unix. On Windows it uses the LockFile(), LockFileEx(), and UnlockFile() system calls. SQLite assumes that these system calls all work as advertised. If that is not the case, then database corruption can result. One should note that POSIX advisory locking is known to be buggy or even unimplemented on many NFS implementations (including recent versions of Mac OS X) and that there are reports of locking problems for network filesystems under Windows. Your best defense is to not use SQLite for files on a network filesystem.

I also remember people having issues with NFS (https://github.com/sissbruecker/linkding/issues/355) even before enabling WAL journal mode.

As an alternative you could take a look at https://litestream.io/, which supports streaming replicas to NFS. https://github.com/fspoettel/linkding-on-fly contains a setup for litestream that may help as guidance.

sissbruecker commented 2 months ago

Closing this for now as I think this is kind of expected with Sqlite. A workaround should be possible using Litestream at least.