openhab / openhabian

openHABian - empowering the smart home, for Raspberry Pi and Debian systems
https://community.openhab.org/t/13379
ISC License
820 stars 251 forks source link

Dynamically adjust zram memory constraints #1647

Closed ecdye closed 2 years ago

ecdye commented 2 years ago

Fixes #1646

Signed-off-by: Ethan Dye mrtops03@gmail.com

ecdye commented 2 years ago

I'd ask to increase disk_size even further. mem_limit is critically affecting RAM and therefore important to system stability but disk_size isn't really but it's beneficial in situations like the restore in the forum post.

I don't think so. As it states in the zram documentation the disk size should only be a max of 400% more than your mem_limit otherwise it is not beneficial and will add increased overhead. Additionally, the 400% value is only really applicable in cases where the data is super compressible compressible, and I would venture to guess that openHAB's data is only moderately compressible on average. The values I used are a little conservative, however I believe that in the vast majority of cases they will be more than sufficient without causing too much impact on system resources on the targeted systems. Keep in mind we are targeting a large user base and the vast majority of which don't need super large amounts of memory for their persistence data. Keep in mind the only data actually stored in RAM is the data accumulated since the last sync to disk (ie reboot).

Worth to increase for standard ztab as well ?

Once again no, both for the reasons listed above, and for the simple fact that up to this point there has been no need to and many users have had great results. I prefer to operate under the don't fix what ain't broke rule.

In the end I believe that the values I have chosen will offer the greatest benefit to the most users. Much more beyond what I've done here really enters custom territory better left to the enterprising user who is comfortable editing the config file to obtain the optimal result for their setup.

mstormi commented 2 years ago

As it states in the zram documentation the disk size should only be a max of 400% more than your mem_limit otherwise it is not beneficial and will add increased overhead.

Increased overhead does not mean not beneficial, just a decrease in storage efficiency. I'd also think that the increase in overhead won't be much really. And even more important: that all by itself is also no reason not to go there when the (application level) benefits outweigh that.

Additionally, the 400% value is only really applicable in cases where the data is super compressible compressible and I would venture to guess that openHAB's data is only moderately compressible on average.

I think we must distinguish logging from persistence. I'm using openHAB default rrd4j persistence and I'm a pretty average user in terms of number of items hence persistence data size. Compression ratio as you see is ~5:1. If I manually gzip an openhab logfile I get ~10:1. And watch that current log compression ratio!

[12:33:52] root@mysmarthouse:/var/log# zramctl NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram2 zstd 600M 407,9M 77,6M 200M 4 /opt/zram/zram2 /dev/zram1 zstd 1000M 381,9M 6,7M 186M 4 /opt/zram/zram1 /dev/zram0 lzo-rle 600M 182,3M 42,8M 76,2M 4 [SWAP]

Keep in mind the only data actually stored in RAM is the data accumulated since the last sync to disk (ie reboot).

Ok for logging. Total logging space consumption is also pretty well limited due to logrotate/log4j config. But my main concern is persistence and for persistence your statement is not really true. Files get written to on every item update. From zram point of view the whole file is stored in RAM from there on as zram cannot do sub-file level diffs. So assuming 90% of items change within say a day, 90% of space consumed by persistence files will be in use in zram's RAM within 1 day of OH runtime.

ecdye commented 2 years ago

I don't completely agree, as this is the first time we have ever really seen this issue I would prefer to be more conservative and adjust later. Perhaps something we could do in the future is add a menu option to enable an even more aggressive zram profile.

mstormi commented 2 years ago

I still dislike the persistence disk_limit. Doubling it like you did is not enough IMHO, I tripled and went for 1G on persistence and 600M on the others myself and suggest you do here, too. As I said I'm an average user in terms of persistence size, plus things will worsen the longer have your system in operations and accumulate data.

But what's worse, it applies to old small mem systems as well. As we should not double mem_size there we should at least quadruple disk_size.

mstormi commented 2 years ago

A note on conservatism w.r.t. ztab values: Optimization is always a tricky process. Remember changes only apply to new installs. With issues to only eventually rise because of this but for certain only on new installs, we must not be conservative in changing or we won't be seeing effects nor getting feedback at all so we don't know if the new values cause problems and also not if they solve any such as that restore thing. People won't tell. We'll not get to know about that.

So if we think disk_size should be say 1G from the application optimization POV then we should deploy 1G from now on and not anything less. Conservatism is good where it applies to running systems but on new ones it annoyingly slows down the process of determining a new, more optimized setting. Same argumentation applies to new JDK. Just on that you have been way more courageous ;-)

ecdye commented 2 years ago

Because you have expressed so much concern over this issue I'm going to do some more research and testing to try and find the optimal values for my 1G system and then scale up from there. As a handy reference that might interest you to give more details about why I believe what I do see https://unix.stackexchange.com/questions/594817/why-does-zram-occupy-much-more-memory-compared-to-its-compressed-value.

ecdye commented 2 years ago

Additionally, the COMPR value that you keep referencing is a bit misleading. Try running zramctl --output-all and pay attention to the MEM_USED value as that is that amount of RAM that the zram device is actually using including all of the other overhead, and it matches my research of about 100%-150% of the mem_limit as the disksize value.

@mstormi

mstormi commented 2 years ago

On swap that you increased: I would not think we should increase swap size as that potentially results in increased need for RAM during standard operations. That puts any 1G box at risk. Note that if you increase disk_size limit for swap in ztab, you equally increase total swap space too, as the OS for swap just takes what's available on the swap device, and with zswap that's not a fixed size partition or file defined in fstab or through dphys-swapfile like it was before zram. Instead, that device's size is defined through disk_size in ztab I think. Z-Swapping is a tradeoff memory vs CPU. We don't have CPU shortage so there's no benefit in trading in RAM to lower CPU. The only thing swap needs to be is actually to be large enough to not make the whole OS run into out of virtual memory (Java being the largest but not only process). But pay attention: when there's more swap available than needed to avoid OO(V!)M, Linux will proactively put copies of RAM pages there. Pages it might never need to remove from process space RAM, but that's what it does, OS is proactive so it does not have to start copying right the moment when there's demand for RAM. But this copying will occupy additional space on swap hence in RAM because swap is in zram.

ecdye commented 2 years ago

On swap that you increased:

I only increased the swap on the large memory file, the small memory file I just adjusted the disk size to match 150% of the mem_limit. I agree with all your points however they don't make sense in the context of a large memory box. I am ok with leaving the swap the same (at the lower value) for all devices if that is the point you are trying to make.

mstormi commented 2 years ago

no your current changes are ok w.r.t. to swap

Still not content with persistence though see previous comments

ecdye commented 2 years ago

@mstormi Unrelated to our unresolved issue with devices size. Something in the GHA environment changed that no longer allows our BATS zram tests to run. I am working on implementing a very complex testing procedure to allow me to run tests on the main repo for zram. As such, and because I have not been able to find a simple or good fix for our tests I believe that at the moment the best option is to remove them from our testing suite. Thoughts?

mstormi commented 2 years ago

Thoughts?

No immediate ones. I'm ok with removing zram bats tests for the time being.

ecdye commented 2 years ago

I'm actually about to push some breaking changes to zram-config proper and will need to update the code here for installation so this is perfect timing. I'll merge and then push those other unrelated changes once I have merged them in zram-config.