home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.57k stars 940 forks source link

zram0 not used for swap on RPi 3B+ #2591

Closed prj closed 1 year ago

prj commented 1 year ago

Describe the issue you are experiencing

zram based swap is not used.

What operating system image do you use?

rpi3 (Raspberry Pi 3 32-bit OS)

What version of Home Assistant Operating System is installed?

10.2

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

Boot up the OS. Check the swap.

Filename                                Type            Size            Used            Priority
/mnt/data/swapfile                      file            307348          252144          -2

This is the only swap file, it is located on the SD card and it is unbearably slow.

zram0 device is already there, but unused. It is enough to do this:

zramctl /dev/zram0 -s 500M -t 4 -a lz4
mkswap /dev/zram0
swapon /dev/zram0 -p 5

This activates the zram swap and HA is perfectly usable after that. Maybe it makes sense to use even more than 500M. Either way, this single change makes HA perfectly viable on the 3B+ again.

Is this planned obsolescence? Or did the swap setup get borked somewhere by accident?

Anything in the Supervisor logs that might be useful for us?

No

Anything in the Host logs that might be useful for us?

No

System information

No response

Additional information

No response

prj commented 1 year ago

Or am I missing something and the memory is already compressed in some other way?

prj commented 1 year ago

Reverting back to 9.5 fixed this for me and there's /dev/zram0 again in swap settings.

agners commented 1 year ago

We switched from zram to zswap, you can read the details in the relevant PR at #2420.

prj commented 1 year ago

Unfortunate, in this case I need to stay on 9.5, as the performance takes a massive hit otherwise, it's almost unusable.

prj commented 1 year ago

P.S. Please communicate clearly that devices with less than 2GB RAM are no longer supported for HA use since 10.0. Your change has the exact opposite effect on devices with <2GB RAM and HA. Yes, they do not crash outright now without an extra swap partition, but they are so slow that they are essentially unresponsive and unusable.

agners commented 1 year ago

Yes, they do not crash outright now without an extra swap partition, but they are so slow that they are essentially unresponsive and unusable.

That is absolutely not what I experience here: I have multiple test devices (with add-ons installed) with 1GB of RAM running here and they are pretty snappy, no problem.

Can you share a bit more information about your exact setup (e.g. what add-on etc. you are using?). I'd really like to reproduce and understand where time is spent.

Can you share the output of cat /proc/vmstat to understand the memory situation your system is at and check top if maybe there is a high CPU load of certain processes? You'll need to have SSH access on HAOS (see this guide).

prj commented 1 year ago

vmstat:

# cat /proc/vmstat
nr_free_pages 10427
nr_zone_inactive_anon 68037
nr_zone_active_anon 65920
nr_zone_inactive_file 8627
nr_zone_active_file 21322
nr_zone_unevictable 0
nr_zone_write_pending 52
nr_mlock 0
nr_bounce 0
nr_zspages 17588
nr_free_cma 1014
nr_inactive_anon 68037
nr_active_anon 65920
nr_inactive_file 8627
nr_active_file 21322
nr_unevictable 0
nr_slab_reclaimable 12160
nr_slab_unreclaimable 12199
nr_isolated_anon 0
nr_isolated_file 0
workingset_nodes 2314
workingset_refault_anon 611061
workingset_refault_file 2943742
workingset_activate_anon 102645
workingset_activate_file 2091025
workingset_restore_anon 50467
workingset_restore_file 1828130
workingset_nodereclaim 5206
nr_anon_pages 133494
nr_mapped 18817
nr_file_pages 31455
nr_dirty 52
nr_writeback 0
nr_writeback_temp 0
nr_shmem 27
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_file_hugepages 0
nr_file_pmdmapped 0
nr_anon_transparent_hugepages 0
nr_vmscan_write 716055
nr_vmscan_immediate_reclaim 2897
nr_dirtied 925868
nr_written 1630101
nr_kernel_misc_reclaimable 0
nr_foll_pin_acquired 0
nr_foll_pin_released 0
nr_kernel_stack 9712
nr_page_table_pages 3222
nr_swapcached 1644
nr_dirty_threshold 6844
nr_dirty_background_threshold 3418
pgpgin 17157857
pgpgout 5509661
pswpin 611062
pswpout 715924
pgalloc_dma 36929444
pgalloc_dma32 0
pgalloc_normal 0
pgalloc_movable 0
allocstall_dma 0
allocstall_dma32 0
allocstall_normal 318
allocstall_movable 406
pgskip_dma 0
pgskip_dma32 0
pgskip_normal 0
pgskip_movable 0
pgfree 38544223
pgactivate 3334952
pgdeactivate 5535074
pglazyfree 652966
pgfault 47628768
pgmajfault 487405
pglazyfreed 9071
pgrefill 9477216
pgreuse 2102005
pgsteal_kswapd 4180318
pgsteal_direct 38219
pgdemote_kswapd 0
pgdemote_direct 0
pgscan_kswapd 7211660
pgscan_direct 105160
pgscan_direct_throttle 2
pgscan_anon 2906788
pgscan_file 4410032
pgsteal_anon 723124
pgsteal_file 3495413
pginodesteal 0
slabs_scanned 1851524
kswapd_inodesteal 235263
kswapd_low_wmark_hit_quickly 28
kswapd_high_wmark_hit_quickly 1542
pageoutrun 6001
pgrotated 90444
drop_pagecache 0
drop_slab 0
oom_kill 0
pgmigrate_success 1567424
pgmigrate_fail 18752
thp_migration_success 0
thp_migration_fail 0
thp_migration_split 0
compact_migrate_scanned 52676905
compact_free_scanned 287398888
compact_isolated 3808216
compact_stall 0
compact_fail 0
compact_success 0
compact_daemon_wake 976
compact_daemon_migrate_scanned 150005
compact_daemon_free_scanned 708301
cma_alloc_success 12
cma_alloc_fail 0
unevictable_pgs_culled 0
unevictable_pgs_scanned 0
unevictable_pgs_rescued 0
unevictable_pgs_mlocked 0
unevictable_pgs_munlocked 0
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
swap_ra 181170
swap_ra_hit 179097
nr_unstable 0
#

There is no high cpu load on any addons. CPU Load average looks like this:

# uptime
 16:18:03 up 2 days, 5 min,  0 users,  load average: 0.39, 0.25, 0.20

The two addons I use that are fairly resource hungry are: 1) MariaDB - uses ~10% RAM 2) NodeRED - uses ~8% RAM

Everything else has negligible RAM usage. 4 more addons with a total of <1% between all of them.

Here is swap usage and ZRAM usage after 2 days:

# cat /proc/swaps
Filename                                Type            Size            Used            Priority
/dev/zram0                              partition       2097148         292832          5
/backup/_swap.swap                      file            2097148         16716           -2
# zramctl
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram2 lz4            16M   436K 75.4K  124K       4 /tmp
/dev/zram1 lz4            32M   176K  5.1K  108K       4 /var
/dev/zram0 zstd            2G 285.5M 64.9M 68.7M       4 [SWAP]
#

The swappiness is currently set to 10. I had major slowdowns with it set to 1, maybe because ZRAM and the kernel are fighting over memory, I don't know.

But fact of the matter is, 2 days and I am already at 285M swap usage (I also have a backup swap, since a long time ago on the SD card, because HA would crash every 2 weeks without it. Probably not necessary anymore, since I increased the ZRAM to 2GB)... If I did not have the ZRAM swap, I would have 69M more RAM, but I would also have over 200M swapped out to the SD card.

The swap gets hit pretty hard already during the startup of HA Core, when MariaDB and NodeRED load (over 200M swapped).

I honestly don't see why you don't want to use the ZRAM swap anymore. Make it conditional based on amount of RAM on the device. For anything 1GB or below it is of huge benefit, and with swappiness set to 10 it does not get hit until there is a good reason to hit it.

Certainly on my device with NodeRED and MariaDB active the lack of ZRAM means that it is essentially incapable of running HA. Yes, I know I can throw this one in the trash and get a RPi4, but why? It's perfectly functional using ZRAM.

That is absolutely not what I experience here: I have multiple test devices (with add-ons installed) with 1GB of RAM running here and they are pretty snappy, no problem.

I am guessing your test devices either do not have any serious addons installed or they all have very fast media (e.g. SSD not SD card). Otherwise I have no idea how you could come to this conclusion.