home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
5.06k stars 988 forks source link

HAOS in Proxmox VM - Memory Cache never being released #2999

Closed gfn256 closed 11 months ago

gfn256 commented 11 months ago

Describe the issue you are experiencing

I have noticed (maybe since forever - have been running HAOS in VM on proxmox for a few years) that memory usage - as reported in Proxmox about HAOS VM - steadily rises in increments over time. Note that memory usage as reported from inside HAOS does not rise! I finally decided to analyze this problem - and managed to discover that the memory increments occur whenever a backup is made inside HAOS. What happens is - the buff/cache inside HAOS VM increases with every backup - BUT IS NEVER RELEASED!

Here is example from inside HAOS VM (ssh output):

[core-ssh ~]$ free -h total used free shared buff/cache available Mem: 7.8G 844.6M 2.2G 5.1M 4.7G 6.8G Swap: 2.6G 0 2.6G

As you can see I have a total of 8GB of ram allocated to HAOS VM - with less than 1GB actually being used, and 6.8GB being available, BUT the buff/cache has reached 4.7GB so HAOS reports that only 2.2GB is "free"! This is what Proxmox sees and reports! This "free" number steadily decreases with every backup as the buff/cache rises - NEVER BEING RELEAESED!

Maybe someone can enlighten me on this situation.

Upon googling around - I found this exact issue in a HA blog:

https://community.home-assistant.io/t/memory-leak-home-assistant-2022/457565/61

In his blog he suggests clearing the memory cache with:

sync; echo 3 > /proc/sys/vm/drop_caches

However on my HAOS system I am unable to this as I get:

-bash: /proc/sys/vm/drop_caches: Read-only file system

My only workaround (definitely NOT A SOLOUTION!) is to reboot the VM - and then memory returns to "normal"!

What operating system image do you use?

ova (for Virtual Machines)

What version of Home Assistant Operating System is installed?

Home Assistant OS 11.2

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Keep running HAOS for a week
  2. Check Mem usage in Proxmox periodically
  3. Analyze when jumps occur

Anything in the Supervisor logs that might be useful for us?

Nothing interesting

Anything in the Host logs that might be useful for us?

Nothing interesting

System information

System Information

version core-2023.12.3
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.6
os_name Linux
os_version 6.1.63-haos
arch x86_64
timezone XXXX/XXXX
config_dir /config
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | August 11, 2024 at 3:00 AM relayer_connected | true relayer_region | XX-XXXXXX-XX remote_enabled | true remote_connected | true alexa_enabled | false google_enabled | true remote_server | XX-XXXXXX-XX certificate_status | ready instance_id | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 11.2 -- | -- update_channel | stable supervisor_version | supervisor-2023.11.6 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 30.8 GB disk_used | 10.3 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Mosquitto broker (6.4.0), File editor (5.7.0), Home Assistant Google Drive Backup (0.112.1), RPC Shutdown (2.4), Samba share (12.2.0), Terminal & SSH (9.8.1), eWeLink Smart Home (1.4.3)
Dashboards dashboards | 3 -- | -- resources | 0 views | 15 mode | storage
Recorder oldest_recorder_run | December 9, 2023 at 4:18 PM -- | -- current_recorder_run | December 15, 2023 at 11:21 AM estimated_db_size | 87.14 MiB database_engine | sqlite database_version | 3.41.2

Additional information

No response

sairon commented 11 months ago

It is expected that Linux might/will eventually use all the available RAM it has available. See here to understand what the numbers in free mean: https://www.linuxatemyram.com/

With QEMU/KVM virtualization, the virtual machine behaves like a real computer, and it sees the memory you allocated to it the same way as it would on a bare-metal system. Which means the amount of memory you set for the VM in the Proxmox configuration is dedicated only for it, and it's only up to the guest OS how it uses it. This is not a bug and doing stuff like dropping caches might eventually only have detrimental effect on the system performance.

gfn256 commented 11 months ago

@sairon Thanks for your prompt reply.

In the link you referenced https://www.linuxatemyram.com/ , I quote "If, however, you find yourself needing to clear some RAM quickly to workaround another issue, like a VM misbehaving, you can force linux to nondestructively drop caches using echo 3 | sudo tee /proc/sys/vm/drop_caches." So it appears to be "nondestructive".

Yes I agree this doesn't mean that it doesn't impact general system performance, but why not cater for folks virtualizing to manually/periodically be able to perform this - without having to completely reboot.

On a second note - why does this additional cache-grabbing have to be done again for every backup? Maybe its beyond our control?

sairon commented 11 months ago

It is nondestructive in a meaning it does not cause system instability. However, it hurts performance - you trade free memory (which means nothing in the guest OS, given there's enough of available memory) for more disk I/O operations which must be done when the OS wants to access any files on the disk again. There is also answer for your second question - it happens after the backup because HA accesses large amount of data at that time. It's up to the Linux kernel which data it keeps in caches and which are dropped.

For the other question, it's wrong to treat the part of the memory used by caches as somehow available to the host OS when the guest is running, dropping caches regularly to make the graphs nicer would be wrong and if the memory were needed by the guest later, you will run into OOM situation anyway. There are methods how memory can be dynamically allocated (search for example for "memory ballooning") but it has some drawbacks too, and it's beyond this discussion.

gfn256 commented 11 months ago

@sairon Thanks again for you're prompt reply. I employ memory "balooning devices" regularly in Poxmox with my VM's. However in this case I have to agree with you, that for the HAOS VM, which I've "only" given 8GB of ram anyway, I see not much benefit to enabling balooning or not. I do agree with you, that purely because of cosmetical reasons (graph-enhancing), there is no point, to clearing the cache. However, I was bothered that somehow, Proxmox will become "upset" if the HAOS VM memory (as it sees it) becomes 100% used for a long period. Please note this has never yet happened to me. Just sharing my thoughts.....

Impact123 commented 11 months ago

The PVE memory usage/graph is often causing confusion (especially with ZFS) and as such it's often discussed on their forums. It also depends a bit on the guest OS (Windows behaves/reports different), qemu agent, ballooning and so on. This happens for other linux distros too but see the link to research more.