zadam / trilium

Build your personal knowledge base with Trilium Notes
GNU Affero General Public License v3.0
26.25k stars 1.8k forks source link

(Bug report) docker container die once an hour #2758

Closed supermayx closed 2 years ago

supermayx commented 2 years ago

Trilium Version

0.50.2

What operating system are you using?

Other (specify below)

What is your setup?

Local + server sync

Operating System Version

OpenWrt docker

Description

I am using a Amlogic S905D box running openwrt, it has 4-core A53 CPU, 2G RAM, 8G ROM, and have been running trilium docker for about half a year with no issue.

AArch64 Processor : Cortex-A53 * 4 OpenWrt R22.1.1 / LuCI Master (git-21.335.48743-5f363d9) 5.4.170-flippy-68+o

I noticed the server crashes now and then recently, in Portainer the crash log says "Exit code 137", seems due to memory issue.

And I checked the container memory and CPU consume, at the begining, everything is fine, but after about an hour, it will go up and up until the whole system crash. (add screenshot later) image image (The last timetag of CPU chart is wrong, don't know why)

I set the RAM limit(768MB) for the container, now the system won't crash, but trilium container still do. Then I set it to "restart always", it still usable as a sync server. My database is quite small, only 187MB.

In openwrt docker log, we can see the container dies once an hour.

[2022-03-29 00:16:54] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 00:16:55] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 01:17:56] container oom Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 01:17:57] container die Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 01:17:57] network disconnect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 01:17:57] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 01:17:58] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 02:18:59] container oom Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 02:19:00] container die Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 02:19:00] network disconnect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 02:19:00] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 02:19:01] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 03:20:01] container oom Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 03:20:02] container die Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 03:20:02] network disconnect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 03:20:02] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 03:20:03] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 04:21:03] container oom Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 04:21:04] container die Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 04:21:04] network disconnect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 04:21:04] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 04:21:06] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 05:22:06] container oom Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 05:22:07] container die Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 05:22:07] network disconnect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 05:22:07] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 05:22:08] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 06:23:09] container oom Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 06:23:09] container die Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 06:23:10] network disconnect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 06:23:10] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 06:23:11] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 07:24:11] container oom Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 07:24:12] container die Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 07:24:12] network disconnect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 07:24:12] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 07:24:13] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 08:25:10] container oom Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 08:25:11] container die Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 08:25:11] network disconnect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 08:25:11] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 08:25:12] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium ....... [2022-03-29 09:25:55] container top Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 09:26:00] container top Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 09:26:05] container top Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 09:26:09] container oom Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 09:26:10] container die Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 09:26:10] network disconnect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 09:26:10] network connect Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Network Name: bridge Network type: bridge [2022-03-29 09:26:11] container start Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium [2022-03-29 09:26:11] container top Container ID: c924744b0a1d8a4e7fd1debd830e62375c10220b47c9e0e6838568533187b800 Container Name: trilium

sigaloid commented 2 years ago

Can you add the trilium log? It should be in the directory binded to the Docker container.

supermayx commented 2 years ago

Thanks. Log file is here: trilium-2022-03-29.log

I checked the log, found that trilium will do a database optimization operation every hour, and all the die events happened right after it.

log of the death event at 9:25

Click to expand! 09:24:59.655 Content hash computation took 263ms 09:24:59.663 Slow 200 GET /api/sync/check with 13440 bytes took 272ms 09:25:22.232 Optimizing database 09:25:22.233 Optimization finished. 09:25:24.518 Table counts: notes: 1672, note_revisions: 178, branches: 1691, attributes: 4423, etapi_tokens: 3 09:25:24.586 Slow query took 35ms: SELECT id, entityId FROM entity_changes JOIN note_contents ON entityId = noteId WHERE entity_changes.isErased = 1 AND entity_changes.entityName = 'note_contents' 09:25:24.694 Slow query took 70ms: SELECT id, entityId FROM entity_changes LEFT JOIN attributes ON entityId = attributeId WHERE entity_changes.isErased = 0 AND entity_changes.entityName = 'attributes' AND attributeId IS NULL 09:25:24.784 Slow query took 56ms: SELECT attributeId, attributes.value AS noteId FROM attributes LEFT JOIN notes ON notes.noteId = attributes.value WHERE attributes.isDeleted = 0 AND attributes.type = 'relation' AND notes.noteId IS NULL 09:25:25.600 Slow query took 782ms: SELECT notes.noteId, notes.type, notes.mime FROM notes JOIN note_contents USING (noteId) WHERE isDeleted = 0 AND isProtected = 0 AND content IS NULL 09:25:25.795 All consistency checks passed with no errors detected (took 1281ms) 09:25:58.927 304 GET /api/setup/status with 59 bytes took 4ms 09:25:58.944 200 POST /api/login/sync with 55 bytes took 5ms 09:25:59.009 200 GET /api/sync/changed?instanceId=teLWKHGjGWcR&lastEntityChangeId=12795&logMarkerId=B0jGw9FZlN with 72 bytes took 4ms 09:25:59.090 204 POST /api/sync/finished with 0 bytes took 3ms 09:25:59.505 Slow query took 388ms: SELECT entityName, entityId, hash FROM entity_changes WHERE isSynced = 1 AND entityName != 'note_reordering' 09:25:59.818 Content hash computation took 701ms 09:25:59.884 Slow 304 GET /api/sync/check with 13440 bytes took 768ms 09:26:13.994 Slow query took 41ms: SELECT value FROM options WHERE name = 'initialized' 09:26:14.007 DB size: 192204 KB 09:26:22.306 Slow query took 821ms: SELECT noteId, title, type, mime, isProtected, dateCreated, dateModified, utcDateCreated, utcDateModified FROM notes WHERE isDeleted = 0 09:26:22.680 Slow query took 309ms: SELECT branchId, noteId, parentNoteId, prefix, notePosition, isExpanded, utcDateModified FROM branches WHERE isDeleted = 0 09:26:23.426 Slow query took 624ms: SELECT attributeId, noteId, type, name, value, isInheritable, position, utcDateModified FROM attributes WHERE isDeleted = 0 09:26:23.716 Becca (note cache) load took 2274ms 09:26:23.735 App HTTP server starting up at port 8080 09:26:23.736 { "appVersion": "0.50.2", "dbVersion": 194, "syncVersion": 25, "buildDate": "2022-02-09T22:52:36+01:00", "buildRevision": "23daaa2387a0655685377f0a541d154aeec2aae8", "dataDirectory": "/home/node/trilium-data", "clipperProtocolVersion": "1.0", "utcDateTime": "2022-03-29 01:26:13.946Z" } 09:26:23.739 CPU model: unknown, logical cores: 4 freq: 1512 Mhz 09:26:23.767 Listening on port 8080 09:26:27.784 Slow query took 42ms: SELECT COUNT(1) FROM notes 09:26:27.800 Table counts: notes: 1672, note_revisions: 178, branches: 1691, attributes: 4423, etapi_tokens: 3 09:26:27.879 Slow query took 46ms: SELECT noteId as entityId FROM note_contents LEFT JOIN entity_changes ON entity_changes.entityName = 'note_contents' AND entity_changes.entityId = noteId WHERE entity_changes.id IS NULL 09:26:28.565 Slow query took 528ms: SELECT notes.noteId, notes.type, notes.mime FROM notes JOIN note_contents USING (noteId) WHERE isDeleted = 0 AND isProtected = 0 AND content IS NULL 09:26:28.684 All consistency checks passed with no errors detected (took 929ms) 09:26:59.169 Slow 304 GET /api/setup/status with 59 bytes took 32ms 09:26:59.315 Slow 200 POST /api/login/sync with 55 bytes took 27ms 09:26:59.441 200 GET /api/sync/changed?instanceId=teLWKHGjGWcR&lastEntityChangeId=12795&logMarkerId=taGqfyAdYV with 72 bytes took 9ms 09:26:59.526 204 POST /api/sync/finished with 0 bytes took 6ms 09:26:59.728 Slow query took 165ms: SELECT entityName, entityId, hash FROM entity_changes WHERE isSynced = 1 AND entityName != 'note_reordering' 09:26:59.950 Content hash computation took 388ms 09:26:59.954 Slow 304 GET /api/sync/check with 13440 bytes took 395ms

And more interestingly, after I reported this bug, it has been alive for straight 12 hours without any issue. LOL I didn't do anything other than added a TZ ENV, trying to change the log timestamp inside container (and found out it's not working), maybe that's the cure? (-_-!)

image

sigaloid commented 2 years ago

It looks like each database optimization completed successfully and the consistency checks passed as well. If it hasn't happened again since you restarted it, I'd bet it was a sporadic problem with the database optimization, but your database still seems okay.

Did you try explicitly stopping the container and restarting it (before it fixed itself)? The automatic restarting of the container might have been a different process than manually doing it while you changed the environment variable, thus triggering an actual reboot and fixing it.

supermayx commented 2 years ago

Acturally I've tried every kind of restart method at the beginning, unplug and replug the power cord, manual restart in Portainer, redeploy the container, but only the last time it stopped dying. Now it's been working for a whole day, hope it will hang on in the future, if it dies again I will tell you. Thanks!