clamd using all memory and getting oom killed since 567064e

ThinIce commented 4 years ago

Prior to placing the issue, please check following: (fill out each checkbox with an X once done)

[ x] I understand, that not following or deleting the below instructions, will result in immediate closing and deletion of my issue.
[ x] I have understood that answers are voluntary and community-driven, and not commercial support.
[ x] I have verified that my issue has not been already answered in the past. I also checked previous issues.

Description of the bug:

Since last update containing https://github.com/mailcow/mailcow-dockerized/commit/567064ed509db373e52d67f944677984030a2389 clamd has been using much more memory, to the extent that the server oom kills it. server has 4GB of ram and has been running without issue (regularly updated).

I'm unsure whether it is a particular message that triggers this or the clamd update process. After the instance logged below, it ran fine all day before then causing problems again mid evening. I've had to disable clamd for now in the config file.

Docker container logs of affected containers:

clamd-mailcow_1      | Wed Sep 16 03:08:28 2020 -> ClamAV update process started at Wed Sep 16 03:08:28 2020
clamd-mailcow_1      | Wed Sep 16 03:08:30 2020 -> daily.cvd database is up to date (version: 25930, sigs: 4317819, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Sep 16 03:08:30 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Sep 16 03:08:30 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | Wed Sep 16 03:46:46 2020 -> SelfCheck: Database status OK.
clamd-mailcow_1      | Wed Sep 16 04:11:40 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 04:33:26 2020 -> instream(172.22.1.13@46950): OK
clamd-mailcow_1      | Wed Sep 16 04:48:14 2020 -> SelfCheck: Database status OK.
clamd-mailcow_1      | receiving incremental file list
clamd-mailcow_1      | ./
clamd-mailcow_1      | blurl.ndb
clamd-mailcow_1      | jurlbl.ndb
clamd-mailcow_1      | phishtank.ndb
clamd-mailcow_1      | rogue.hdb
clamd-mailcow_1      | 
clamd-mailcow_1      | sent 23,180 bytes  received 350,104 bytes  746,568.00 bytes/sec
clamd-mailcow_1      | total size is 18,793,645  speedup is 50.35
clamd-mailcow_1      | RELOADING
clamd-mailcow_1      | Wed Sep 16 05:00:29 2020 -> Reading databases from /var/lib/clamav
clamd-mailcow_1      | Wed Sep 16 05:00:37 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 06:55:26 2020 -> ClamAV update process started at Wed Sep 16 06:55:26 2020
clamd-mailcow_1      | Wed Sep 16 06:55:26 2020 -> daily.cvd database is up to date (version: 25930, sigs: 4317819, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Sep 16 06:55:26 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Sep 16 06:55:26 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | /clamd.sh: line 97:    31 Killed                  nice -n10 clamd
clamd-mailcow_1      | /clamd.sh: line 98: kill: (31) - No such process
clamd-mailcow_1      | Worker 31 died, stopping container waiting for respawn...
clamd-mailcow_1      | Cleaning up tmp files...
clamd-mailcow_1      | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   File: /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   Size: 142          Blocks: 8          IO Block: 4096   regular file
clamd-mailcow_1      | Device: 31h/49d  Inode: 2076528     Links: 1
clamd-mailcow_1      | Access: (0644/-rw-r--r--)  Uid: (  700/  clamav)   Gid: (  700/  clamav)
clamd-mailcow_1      | Access: 2020-09-15 16:35:09.048000000 +0000
clamd-mailcow_1      | Modify: 2020-09-16 06:56:19.268000000 +0000
clamd-mailcow_1      | Change: 2020-09-16 06:56:19.328000000 +0000
clamd-mailcow_1      |  Birth: -
clamd-mailcow_1      | dos2unix: converting file /var/lib/clamav/whitelist.ign2 to Unix format...
clamd-mailcow_1      | Running freshclam...
clamd-mailcow_1      | Wed Sep 16 06:56:19 2020 -> ClamAV update process started at Wed Sep 16 06:56:19 2020
clamd-mailcow_1      | Wed Sep 16 06:56:19 2020 -> daily.cvd database is up to date (version: 25930, sigs: 4317819, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Sep 16 06:56:19 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Sep 16 06:56:19 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: Global time limit set to 120000 milliseconds.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: Global size limit set to 52428800 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: File size limit set to 26214400 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: Recursion level limit set to 5.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: Files limit set to 200.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxEmbeddedPE limit set to 10485760 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxHTMLNormalize limit set to 10485760 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxHTMLNoTags limit set to 2097152 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxScriptNormalize limit set to 5242880 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxZipTypeRcg limit set to 1048576 bytes.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxPartitions limit set to 50.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxIconsPE limit set to 100.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: MaxRecHWP3 limit set to 16.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: PCREMatchLimit limit set to 100000.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: PCRERecMatchLimit limit set to 2000.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Limits: PCREMaxFileSize limit set to 26214400.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Archive support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> AlertExceedsMax heuristic detection disabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Heuristic alerts enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Portable Executable support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> ELF support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Mail files support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> OLE2 support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> PDF support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> SWF support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> HTML support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> XMLDOCS support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> HWP3 support enabled.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Heuristic: precedence enabled
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Self checking every 3600 seconds.
clamd-mailcow_1      | Wed Sep 16 06:58:09 2020 -> Set stacksize to 8454144
clamd-mailcow_1      | Wed Sep 16 07:02:28 2020 -> instream(172.22.1.13@53526): OK
clamd-mailcow_1      | Wed Sep 16 07:02:34 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 07:04:49 2020 -> instream(172.22.1.13@53952): OK
clamd-mailcow_1      | Wed Sep 16 07:04:51 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 07:05:19 2020 -> instream(172.22.1.13@54110): OK
clamd-mailcow_1      | Wed Sep 16 07:05:46 2020 -> instream(local): OK
clamd-mailcow_1      | #####################################################
clamd-mailcow_1      | Welcome to Sanesecurity mirror for Clamav signatures.
clamd-mailcow_1      | 
clamd-mailcow_1      | Service brough you www.virusfree.cz, all activity is
clamd-mailcow_1      | logged and evaluated. Abuse of the service will result
clamd-mailcow_1      | in permanent ban and legal prosecution.
clamd-mailcow_1      | 
clamd-mailcow_1      | Feel free to contact us at support@virusfree.cz
clamd-mailcow_1      | ####################################################
clamd-mailcow_1      | 
clamd-mailcow_1      | receiving incremental file list
clamd-mailcow_1      | ./
clamd-mailcow_1      | blurl.ndb
clamd-mailcow_1      | phishtank.ndb
clamd-mailcow_1      | 
clamd-mailcow_1      | sent 10,800 bytes  received 96,536 bytes  30,667.43 bytes/sec
clamd-mailcow_1      | total size is 18,801,462  speedup is 175.16
clamd-mailcow_1      | RELOADING
clamd-mailcow_1      | Wed Sep 16 07:06:22 2020 -> Reading databases from /var/lib/clamav
clamd-mailcow_1      | /clamd.sh: line 97:    21 Killed                  nice -n10 clamd
clamd-mailcow_1      | /clamd.sh: line 98: kill: (21) - No such process
clamd-mailcow_1      | Worker 21 died, stopping container waiting for respawn...
clamd-mailcow_1      | Cleaning up tmp files...
clamd-mailcow_1      | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   File: /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   Size: 142          Blocks: 8          IO Block: 4096   regular file
clamd-mailcow_1      | Device: 31h/49d  Inode: 2076528     Links: 1
clamd-mailcow_1      | Access: (0644/-rw-r--r--)  Uid: (  700/  clamav)   Gid: (  700/  clamav)
clamd-mailcow_1      | Access: 2020-09-16 06:56:19.392000000 +0000
clamd-mailcow_1      | Modify: 2020-09-16 07:19:28.624000000 +0000
clamd-mailcow_1      | Change: 2020-09-16 07:19:28.648000000 +0000
clamd-mailcow_1      |  Birth: -
clamd-mailcow_1      | dos2unix: converting file /var/lib/clamav/whitelist.ign2 to Unix format...
clamd-mailcow_1      | Running freshclam...
clamd-mailcow_1      | Wed Sep 16 07:19:28 2020 -> ClamAV update process started at Wed Sep 16 07:19:28 2020
clamd-mailcow_1      | Wed Sep 16 07:19:28 2020 -> daily.cvd database is up to date (version: 25930, sigs: 4317819, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Sep 16 07:19:28 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Sep 16 07:19:28 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: Global time limit set to 120000 milliseconds.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: Global size limit set to 52428800 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: File size limit set to 26214400 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: Recursion level limit set to 5.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: Files limit set to 200.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxEmbeddedPE limit set to 10485760 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxHTMLNormalize limit set to 10485760 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxHTMLNoTags limit set to 2097152 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxScriptNormalize limit set to 5242880 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxZipTypeRcg limit set to 1048576 bytes.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxPartitions limit set to 50.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxIconsPE limit set to 100.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: MaxRecHWP3 limit set to 16.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: PCREMatchLimit limit set to 100000.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: PCRERecMatchLimit limit set to 2000.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Limits: PCREMaxFileSize limit set to 26214400.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Archive support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> AlertExceedsMax heuristic detection disabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Heuristic alerts enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Portable Executable support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> ELF support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Mail files support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> OLE2 support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> PDF support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> SWF support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> HTML support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> XMLDOCS support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> HWP3 support enabled.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Heuristic: precedence enabled
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Self checking every 3600 seconds.
clamd-mailcow_1      | Wed Sep 16 07:20:09 2020 -> Set stacksize to 8454144
clamd-mailcow_1      | Wed Sep 16 07:20:24 2020 -> instream(172.22.1.13@55192): OK
clamd-mailcow_1      | Wed Sep 16 07:24:05 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 07:26:48 2020 -> instream(172.22.1.13@56336): OK
clamd-mailcow_1      | Wed Sep 16 07:27:42 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Sep 16 07:27:46 2020 -> instream(172.22.1.13@56530): OK
clamd-mailcow_1      | receiving incremental file list
clamd-mailcow_1      | ./
clamd-mailcow_1      | 
clamd-mailcow_1      | sent 230 bytes  received 260 bytes  980.00 bytes/sec
clamd-mailcow_1      | total size is 18,801,462  speedup is 38,370.33
clamd-mailcow_1      | RELOADING
clamd-mailcow_1      | Wed Sep 16 07:29:29 2020 -> Reading databases from /var/lib/clamav
clamd-mailcow_1      | /clamd.sh: line 97:    21 Killed                  nice -n10 clamd
clamd-mailcow_1      | /clamd.sh: line 98: kill: (21) - No such process
clamd-mailcow_1      | Worker 21 died, stopping container waiting for respawn...
clamd-mailcow_1      | Cleaning up tmp files...
clamd-mailcow_1      | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   File: /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   Size: 142          Blocks: 8          IO Block: 4096   regular file
clamd-mailcow_1      | Device: 31h/49d  Inode: 2076528     Links: 1
clamd-mailcow_1      | Access: (0644/-rw-r--r--)  Uid: (  700/  clamav)   Gid: (  700/  clamav)
clamd-mailcow_1      | Access: 2020-09-16 07:19:28.676000000 +0000
clamd-mailcow_1      | Modify: 2020-09-16 07:40:32.700000000 +0000
clamd-mailcow_1      | Change: 2020-09-16 07:40:32.712000000 +0000

Reproduction of said bug:

Logged into server, stopped clamd container, rebooted to make sure server wasn't in inconsistant state after oom. Observed server for the working day, no problem, issue then reoccurred mid evening.

System information:

Question	Answer
My operating system	Ubuntu 18.04.5 LTS
Is Apparmor, SELinux or similar active?	cat /sys/kernel/security/apparmor/profiles

docker-default (enforce) /usr/sbin/tcpdump (enforce) /usr/lib/snapd/snap-confine (enforce) /usr/lib/snapd/snap-confine//mount-namespace-capture-helper (enforce) man_groff (enforce) man_filter (enforce) /usr/bin/man (enforce) /usr/bin/lxc-start (enforce) /usr/lib/connman/scripts/dhclient-script (enforce) /usr/lib/NetworkManager/nm-dhcp-helper (enforce) /usr/lib/NetworkManager/nm-dhcp-client.action (enforce) /sbin/dhclient (enforce) lxc-container-default-with-nesting (enforce) lxc-container-default-with-mounting (enforce) lxc-container-default-cgns (enforce) lxc-container-default (enforce) | | Virtualization technlogy (KVM, VMware, Xen, etc - LXC and OpenVZ are not supported | KVM | | Server/VM specifications (Memory, CPU Cores) | 4GB, 1 core| | Docker Version (docker version) | 19.03.12 | | Docker-Compose Version (docker-compose version) | docker-compose version 1.27.2, build 18f557f9 docker-py version: 4.3.1 CPython version: 3.7.7 OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019 | | Reverse proxy (custom solution) | Nope |

Output of git diff origin/master, any other changes to the code? If so, please post them. No, nothing.
All third-party firewalls and custom iptables rules are unsupported. Please check the Docker docs about how to use Docker with your own ruleset. Nevertheless, iptabels output can help us to help you: iptables -L -vn, ip6tables -L -vn, iptables -L -vn -t nat and ip6tables -L -vn -t nat. Nope.
DNS problems? Please run docker exec -it $(docker ps -qf name=acme-mailcow) dig +short stackoverflow.com @172.22.1.254 (set the IP accordingly, if you changed the internal mailcow network) and post the output.

151.101.1.69 151.101.193.69 151.101.129.69 151.101.65.69

Hedders commented 4 years ago

I'm also seeing this, also on a 4GB server. Seems to happen periodically (no pattern that I can detect) making the server essentially unresponsive for periods of 8-14 mins.

andryyy commented 4 years ago

I mean... we can try to give it a memory limit. Or remove signatures. Limiting resources via compose may introduce new fancy problems on some systems.

Any ideas anybody?

@mkuron ?

Hedders commented 4 years ago

If it helps, this is new - it only seems to be an issue with the latest version of the clamav container.

andryyy commented 4 years ago

We just updated to 0.103.0, it is possible this version has a higher need for memory.

Hedders commented 4 years ago

Ah! This from the release notes for 0.103:

" clamd can now reload the signature database without blocking scanning. This multi-threaded database reload improvement was made possible thanks to a community effort. Non-blocking database reloads are now the default behavior. Some systems that are more constrained on RAM may need to disable non-blocking reloads, as it will temporarily consume double the amount of memory. We added a new clamd config option ConcurrentDatabaseReload, which may be set to no."

andryyy commented 4 years ago

Good catch. :)

Hedders commented 4 years ago

I wonder is there a way to expose that ConcurrentDatabaseReload setting as an option in mailcow.conf?

andryyy commented 4 years ago

I set it to no by default for now and will add it to the docs. All options can be set via data/conf/clamav/clamd.conf.

Hedders commented 4 years ago

Perfect! Thank you very much indeed. Will drop another donation in the morning :)

andryyy commented 4 years ago

I thank you. :)

scryptio commented 3 years ago

I set it to no by default for now and will add it to the docs. All options can be set via data/conf/clamav/clamd.conf.

Just seen this by coincidence. Would prefer managing any kind of config using mailcow.conf if it is possible. It's a good approach to bundle config settings in one file if possible instead of having to edit several files laying somewhere in "unknown" locations. Would make managing things easier.

andryyy commented 3 years ago

We cannot handle every single config there.

We use git for this reason. It will not kill your changes as long as we didn't change it either. If we did, we need to overwrite it for compatibility.

All config files share the same location by the way: data/conf

wblondel commented 3 years ago

My ClamAV is also running OOM when updating even with ConcurrentDatabaseReload set to NO, since a week; it just happened 10 minutes ago

andryyy commented 3 years ago

How much RAM?

wblondel commented 3 years ago

4GB. I know it's not a lot but it has been working fine for 2 years, until now =)

Hedders commented 3 years ago

Silly question but I noticed you say it's set to NO. Does your clamd.conf say "ConcurrentDatabaseReload no" or "ConcurrentDatabaseReload NO" ? IIRC clamd.conf directives are case sensitive.

wblondel commented 3 years ago

Yes it's in lower case. It was added by a recent commit

ConcurrentDatabaseReload no

andryyy commented 3 years ago

"no" is fine.

You can try to decrease SOGo worker count to 10.

ThinIce commented 3 years ago

This has been fine since the ConcurrentDatabaseReload change but just bombed again this evening

clamd-mailcow_1      | receiving incremental file list
clamd-mailcow_1      | ./
clamd-mailcow_1      | blurl.ndb
clamd-mailcow_1      | jurlbl.ndb
clamd-mailcow_1      | phishtank.ndb
clamd-mailcow_1      | rogue.hdb
clamd-mailcow_1      | 
clamd-mailcow_1      | sent 23,606 bytes  received 314,924 bytes  225,686.67 bytes/sec
clamd-mailcow_1      | total size is 18,879,877  speedup is 55.77
clamd-mailcow_1      | RELOADING
clamd-mailcow_1      | Wed Oct  7 17:16:27 2020 -> Reading databases from /var/lib/clamav
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> Database correctly reloaded (9073104 signatures)
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> Database reload completed.
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> Activating the newly loaded database...
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> instream(172.22.1.13@44646): OK
clamd-mailcow_1      | Wed Oct  7 17:17:28 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Oct  7 17:19:09 2020 -> instream(172.22.1.13@45048): OK
clamd-mailcow_1      | Wed Oct  7 17:24:06 2020 -> instream(local): OK
clamd-mailcow_1      | Wed Oct  7 17:27:21 2020 -> ClamAV update process started at Wed Oct  7 17:27:21 2020
clamd-mailcow_1      | Wed Oct  7 17:27:26 2020 -> daily database available for update (local version: 25949, remote version: 25950)
clamd-mailcow_1      | Wed Oct  7 17:27:59 2020 -> Testing database: '/var/lib/clamav/tmp.c550237f77/clamav-fa1482832880b3b414a882962cbfb28f.tmp-daily.cld' ...
clamd-mailcow_1      | /clamd.sh: line 97:    23 Killed                  nice -n10 clamd
clamd-mailcow_1      | /clamd.sh: line 98: kill: (23) - No such process
clamd-mailcow_1      | Worker 23 died, stopping container waiting for respawn...
clamd-mailcow_1      | Cleaning up tmp files...
clamd-mailcow_1      | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   File: /var/lib/clamav/whitelist.ign2
clamd-mailcow_1      |   Size: 142          Blocks: 8          IO Block: 4096   regular file
clamd-mailcow_1      | Device: 5fh/95d  Inode: 1048148     Links: 1
clamd-mailcow_1      | Access: (0644/-rw-r--r--)  Uid: (  700/  clamav)   Gid: (  700/  clamav)
clamd-mailcow_1      | Access: 2020-10-07 17:16:27.404000000 +0000
clamd-mailcow_1      | Modify: 2020-10-07 18:12:30.404000000 +0000
clamd-mailcow_1      | Change: 2020-10-07 18:12:30.460000000 +0000
clamd-mailcow_1      |  Birth: -
clamd-mailcow_1      | dos2unix: converting file /var/lib/clamav/whitelist.ign2 to Unix format...
clamd-mailcow_1      | Running freshclam...
clamd-mailcow_1      | Wed Oct  7 18:12:30 2020 -> ClamAV update process started at Wed Oct  7 18:12:30 2020
clamd-mailcow_1      | Wed Oct  7 18:12:31 2020 -> daily database available for update (local version: 25949, remote version: 25950)
clamd-mailcow_1      | Wed Oct  7 18:12:48 2020 -> Testing database: '/var/lib/clamav/tmp.46a08b2ade/clamav-b81f532048bf594a68b1079705518bf7.tmp-daily.cld' ...
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> Database test passed.
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> daily.cld updated (version: 25950, sigs: 4328320, f-level: 63, builder: raynman)
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> main.cvd database is up to date (version: 59, sigs: 4564902, f-level: 60, builder: sigmgr)
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> bytecode.cvd database is up to date (version: 331, sigs: 94, f-level: 63, builder: anvilleg)
clamd-mailcow_1      | Wed Oct  7 18:13:19 2020 -> ^Clamd was NOT notified: Can't connect to clamd through /run/clamav/clamd.sock: Connection refused

I can decrease SoGo workers as you've recommended, but I don't actually have any users using it, which I guess would make a difference?

mkuron commented 3 years ago

Could you run docker stats every minute and check whether the memory size of the clamd container (or any other container) grows significantly over time? clamd is probably the process with the largest memory usage on your server, so the oom killer kills it even if it‘s not the culprit.

andryyy commented 3 years ago

They can still eat some RAM when you switch between these workers with each new request. Please try docker stats as mkuron suggested and also reduce the worker count. :)

Geitenijs commented 3 years ago

Same issue here, even with ConcurrentDatabaseReload set to no. Every few hours it'll freeze for a minute or two.

Up until this point, mailcow has been running great for over a year on this server.

Clamd log: https://pastebin.com/TBjBUPn9

andryyy commented 3 years ago

So? I cannot change that. If you want to keep using ClamAV, you need more RAM. 👍

Or reduce the SOGo workers. I cannot change that I'm afraid. :/

We will update the requirements.

Geitenijs commented 3 years ago

:(

I get it of course.. Just hoped you'd have a solution for me :)

Thanks anyway, I'll upgrade the RAM of my server.

andryyy commented 3 years ago

Or at least try with less workers in SOGo first. :)

ThinIce commented 3 years ago

Could you run docker stats every minute and check whether the memory size of the clamd container (or any other container) grows significantly over time? clamd is probably the process with the largest memory usage on your server, so the oom killer kills it even if it‘s not the culprit.

I disabled the clamv container and have been watching the others since your post. It looks like on my single system over a working week solr grows slowly, but only by about 100MiB from where it starts (350-450). It looks like rspamd spikes quite severely at times from a resting ~250, I think about 650 is the highest I've seen it. Redis also seems to be capable of varying by a few hundred MiB, presumably depending on what's going on at the time, but nothing has an obvious memory leak.

I'll have to look at something to do this monitoring more scientifically over a longer period and produce some graphs but it seems like maybe the issue is indeed just clam plus other things using more memory at the time down to random usage.

For now I'll reduce the SOGo workers and re-enable clamd and keep an eye on it.

mkuron commented 3 years ago

Just hoped you'd have a solution for me

Unfortunately, ClamAV is quite a memory hog because it loads all its virus definitions into memory, and those obviously get larger with every update. You‘ll need to reduce the set of virus definitions to reduce memory usage. Or reconsider whether you actually need a virus scanner: we block .exe attachments and MS Office documents with macros, which should already take care of most virus distribution vectors.

It looks like on my single system over a working week solr grows slowly, but only by about 100MiB from where it starts (350-450).

SOLR needs quite a lot of memory, depending on how many messages you have. It is recommended to be kept disabled unless you have a lot of memory and very few users.

It looks like rspamd spikes quite severely at times from a resting ~250, I think about 650 is the highest I've seen it.

Rspamd uses Lua, which is garbage-collected. You can reduce the garbage collection timeout (https://github.com/mailcow/mailcow-dockerized/issues/3049#issuecomment-548012475) to keep its memory usage more constant.

Redis also seems to be capable of varying by a few hundred MiB, presumably depending on what's going on at the time,

Redis is an in-memory database that periodically dumps its state to a file. Its memory footprint probably grows as it accumulates transactions between dumps, but I have not seen it consume an unreasonable amount of memory.

jjkondrat commented 3 years ago

I run my server in AWS with 2GB ram.. since I'm running this for personal sites and some friends domains and it's not very high traffic. I however did create a 4GB swap file and have had no issues... Not saying thats suitable for everyone, but may be an option for you if it's not a high traffic server.

My server does need to use the swapfile.... and for i see no reason to pay for more memory: root@mail:/opt/mailcow-dockerized# free -m total used free shared buff/cache available Mem: 1949 1342 115 9 491 444 Swap: 4095 2087 2008

Adorfer commented 3 years ago

@jjkondrat And you are suffering from the same issue "clamd getting oom-killed", so the swap file does not help?

jjkondrat commented 3 years ago

@Adorfer No and I have never had any memory problems on any of the containers using the large swap file. I've been using a large swap file since I built my server well over a year ago.

Adorfer commented 3 years ago

so what is your point posting to this thread? "adding swap may resolve the issue"?

jjkondrat commented 3 years ago

Yes. Although more memory would be better, if the user defines or increases swap file size they may avoid the crash.....

eldrik commented 3 years ago

so I already disabled clamd in mailcow.conf but still getting oom messages that seem related to clamd.

I am running mailcow inside an esxi vm with 2cpu & 4GB RAM.

For me it feels like clamd is still running even disabled via mailcow.conf (last restart of mailcow and server was about 14 days ago)

example docker-compose log entries from restart

clamd-mailcow_1 | Mon Dec 7 09:38:54 2020 -> instream(172.22.1.10@43964): OK clamd-mailcow_1 | Mon Dec 7 09:47:45 2020 -> instream(local): OK clamd-mailcow_1 | Mon Dec 7 09:49:22 2020 -> instream(172.22.1.10@45540): OK clamd-mailcow_1 | Mon Dec 7 09:49:38 2020 -> instream(local): OK clamd-mailcow_1 | Mon Dec 7 09:58:34 2020 -> instream(172.22.1.10@46910): OK clamd-mailcow_1 | Mon Dec 7 10:04:24 2020 -> instream(local): OK clamd-mailcow_1 | Mon Dec 7 10:07:30 2020 -> instream(172.22.1.10@48194): OK clamd-mailcow_1 | Mon Dec 7 10:12:44 2020 -> instream(local): OK clamd-mailcow_1 | Mon Dec 7 10:14:23 2020 -> instream(172.22.1.10@49214): OK clamd-mailcow_1 | Mon Dec 7 10:17:19 2020 -> instream(local): OK clamd-mailcow_1 | Mon Dec 7 10:17:33 2020 -> instream(172.22.1.10@49704): OK clamd-mailcow_1 | Worker 22 died, stopping container waiting for respawn... clamd-mailcow_1 | /clamd.sh: line 97: 22 Killed nice -n10 clamd clamd-mailcow_1 | /clamd.sh: line 98: kill: (22) - No such process clamd-mailcow_1 | Cleaning up tmp files... clamd-mailcow_1 | Copying non-empty whitelist.ign2 to /var/lib/clamav/whitelist.ign2 clamd-mailcow_1 | File: /var/lib/clamav/whitelist.ign2 clamd-mailcow_1 | Size: 142 Blocks: 8 IO Block: 4096 regular file clamd-mailcow_1 | Device: 801h/2049d Inode: 1724287 Links: 1 clamd-mailcow_1 | Access: (0644/-rw-r--r--) Uid: ( 700/ clamav) Gid: ( 700/ clamav) clamd-mailcow_1 | Access: 2020-12-07 08:42:07.698887776 +0100 clamd-mailcow_1 | Modify: 2020-12-07 10:44:34.718272781 +0100 clamd-mailcow_1 | Change: 2020-12-07 10:44:35.406301724 +0100 clamd-mailcow_1 | Birth: - clamd-mailcow_1 | dos2unix: converting file /var/lib/clamav/whitelist.ign2 to Unix format... clamd-mailcow_1 | Running freshclam...

example /var/log/messages entries related to the oom

Dec 7 10:41:21 mailstation kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Dec 7 10:41:21 mailstation kernel: 51489 total pagecache pages Dec 7 10:41:21 mailstation kernel: 49539 pages in swap cache Dec 7 10:41:21 mailstation kernel: Swap cache stats: add 7151097, delete 7101558, find 113127588/114180952 Dec 7 10:41:21 mailstation kernel: Free swap = 0kB Dec 7 10:41:21 mailstation kernel: Total swap = 2095100kB Dec 7 10:41:21 mailstation kernel: 1048446 pages RAM Dec 7 10:41:21 mailstation kernel: 0 pages HighMem/MovableOnly Dec 7 10:41:21 mailstation kernel: 35750 pages reserved Dec 7 10:41:21 mailstation kernel: 0 pages hwpoisoned Dec 7 10:41:21 mailstation kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name Dec 7 10:41:21 mailstation kernel: [ 6202] 401 6202 1050 63 6 3 24 0 anvil Dec 7 10:41:21 mailstation kernel: [ 6203] 402 6203 1115 89 6 3 47 0 log Dec 7 10:41:21 mailstation kernel: [ 6204] 402 6204 2060 0 8 3 174 0 managesieve-log Dec 7 10:41:21 mailstation kernel: [ 6205] 401 6205 2813 256 8 3 308 0 stats Dec 7 10:41:21 mailstation kernel: [ 6206] 0 6206 2367 454 10 3 397 0 config Dec 7 10:41:21 mailstation kernel: [ 6208] 401 6208 6019 151 15 3 291 0 auth Dec 7 10:41:21 mailstation kernel: [ 6226] 101 6226 10984 103 13 3 177 0 tlsmgr Dec 7 10:41:21 mailstation kernel: [29783] 82 29783 59769 576 34 3 1513 0 php-fpm Dec 7 10:41:21 mailstation kernel: [12579] 0 12579 27180 159 10 5 73 1 containerd-shim Dec 7 10:41:21 mailstation kernel: [12594] 0 12594 61127 654 94 3 23559 0 rspamd Dec 7 10:41:21 mailstation kernel: [12735] 101 12735 61127 615 91 3 22719 0 rspamd Dec 7 10:41:21 mailstation kernel: [12736] 101 12736 61127 817 93 3 22616 0 rspamd Dec 7 10:41:21 mailstation kernel: [12739] 101 12739 61127 501 96 3 22767 0 rspamd Dec 7 10:41:21 mailstation kernel: [19409] 101 19409 463594 62805 795 5 41791 0 rspamd Dec 7 10:41:21 mailstation kernel: [ 8628] 82 8628 59768 594 34 3 1587 0 php-fpm Dec 7 10:41:21 mailstation kernel: [ 8636] 999 8636 88806 63793 169 3 3315 0 sogod Dec 7 10:41:21 mailstation kernel: [ 8975] 999 8975 85152 525 162 3 63746 0 sogod Dec 7 10:41:21 mailstation kernel: [22578] 82 22578 59770 592 34 3 1579 0 php-fpm Dec 7 10:41:21 mailstation kernel: [22579] 82 22579 59770 612 34 3 1559 0 php-fpm Dec 7 10:41:21 mailstation kernel: [19575] 0 19575 10767 18 24 3 109 0 systemd-journal Dec 7 10:41:21 mailstation kernel: [19783] 0 19783 27180 101 11 4 69 1 containerd-shim Dec 7 10:41:21 mailstation kernel: [19799] 0 19799 569 5 6 3 15 0 tini Dec 7 10:41:21 mailstation kernel: [19874] 0 19874 933 38 7 3 32 0 clamd.sh Dec 7 10:41:21 mailstation kernel: [19888] 0 19888 933 34 7 3 31 0 clamd.sh Dec 7 10:41:21 mailstation kernel: [19889] 0 19889 933 24 7 3 48 0 clamd.sh Dec 7 10:41:21 mailstation kernel: [19890] 700 19890 403098 233170 646 4 75501 0 clamd

Any help on this?

mailcow / mailcow-dockerized

clamd using all memory and getting oom killed since 567064e #3761