SPL: MMT it's been an hour since the last reap

ylluminate commented 8 years ago

I have been noticing that my ZFS volume has become unresponsive somewhat and upon rebooting I received the following messages:

SPL: MMT it's been an hour since the last reap, vm_page_free_count == 12716231 SPL: reap thread, segkmem_total_mem_allocated delta 846462976 since 3596 seconds ago

I'm a little concerned because when I rebooted, I am still running into some hangs in the middle of copying some files. I'm using the most recent source from master at present.

ylluminate commented 8 years ago

I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot.

brendonhumphrey commented 8 years ago

Those are harmless messages.

On 28 Jul 2016, at 12:19 AM, George Plymale notifications@github.com wrote:

I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openzfsonosx/spl/issues/12#issuecomment-235599391, or mute the thread https://github.com/notifications/unsubscribe-auth/AFWzn46R17RdseRFJPauVjB-YGqGVSLsks5qZ2jagaJpZM4JV6qx.

brendonhumphrey commented 8 years ago

Also did you pick up the SPL change that was made ~24 hrs ago?

On 28 Jul 2016, at 12:19 AM, George Plymale notifications@github.com wrote:

I stepped back to the 1.5.2 release and still seeing the same kind of hangs, although it has not been long enough to see if the initial message above is reported on reboot.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openzfsonosx/spl/issues/12#issuecomment-235599391, or mute the thread https://github.com/notifications/unsubscribe-auth/AFWzn46R17RdseRFJPauVjB-YGqGVSLsks5qZ2jagaJpZM4JV6qx.

ylluminate commented 8 years ago

No, I've not grabbed those. Let me take a look. I went back to the older driver in hopes things would improve, but to no avail. What concerned me was the fact that it wasn't releasing after long periods of time and some degree of inactivity. I probably should make some kind of a shell script that forces it to feel the pressure and release...

brendonhumphrey commented 8 years ago

I see. “An hour since last reap” is behaving normally. That is not changing for now. Its just idle chitchat in the logs.

Brendon

On 29 Jul 2016, at 5:34 AM, George Plymale notifications@github.com wrote:

No, I've not grabbed those. Let me take a look. I went back to the older driver in hopes things would improve, but to no avail. What concerned me was the fact that it wasn't releasing after long periods of time and some degree of inactivity. I probably should make some kind of a shell script that forces it to feel the pressure and release...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openzfsonosx/spl/issues/12#issuecomment-236001156, or mute the thread https://github.com/notifications/unsubscribe-auth/AFWzn09VjeD1WCH3QbinnLTDSmpoKKudks5qaQRCgaJpZM4JV6qx.

brendonhumphrey commented 8 years ago

If you're keen try the code in the SPL branch 'vmem_experiments' we would like some feedback as to how that works for you.

ylluminate commented 8 years ago

Absolutely. Can you tell me what to expect and what the goal is? From the look of the commits it seems fairly stable so I don't think it's likely to produce panics is it?

ilovezfs commented 8 years ago

You should expect :rainbow: :bouquet: :sunny:

ylluminate commented 8 years ago

/me switches to :godmode:

HERE WE GO!

brendonhumphrey commented 8 years ago

I've been running it for a month without reboot, it seems ok. Just alters some allocator details, maybe works better. Dont want to cloud your judgement by making claims. Do whatever you did to create your performance stalls or whatever they were.

brendonhumphrey commented 8 years ago

Did you try it?

ylluminate commented 8 years ago

I've actually got a heavy copy process going on it right now that should have prompted the issue I was having earlier. I've actually cranked it up and have 6 rsyncs going full bore and while it has maxed out my memory usage, my system is still semi usable. I have two different volumes copying via these multiple threads onto the raid-z. So yeah, so far so good... I wish I could cut down the memory consumption further though. Like I said, it's consuming everything and using swap at the moment on a 64GB system (htop is showing kernel_task is at 103 GB virt).

brendonhumphrey commented 8 years ago

ok sounds promising. If you want to limit memory use, why dont you set the arc max?

ylluminate commented 8 years ago

Well that's what is confusing @brendonhumphrey, I do have kstat.zfs.darwin.tunable.zfs_arc_max=8589934592 set in /etc/zfs/zsysctl.conf, yet that still happened.

brendonhumphrey commented 8 years ago

That is a similar setting to what I have, and the machine is obediently keeping memory use in check. You havent got any other settings in zsysctl.conf have you? I accidentally inherited something from ilovezfs' example script that blocked zfs_arc_max from working. I'm not near the machine, so cant quote the exact setting.

you could also do sysctl -a | grep spl. One of the sysctls (theres only a few, so it should be obvious which one) tells how much memory we have allocated in total. You can use that to check that it is o3x that is occupying the memory.

ylluminate commented 8 years ago

Hmm, that's the only line in zsysctl.conf, and yes, I do use @ilovezfs's script, so I must have also inherited that setting somewhere... What may be doing that is not readily popping out at me from the settings so I'll hang tight until you can pull it up...

brendonhumphrey commented 8 years ago

Ok, you really dont want this setting (assuming you pulled it in from the .example file):

10 Mar 2015; ilovezfs Raise zfs_arc_meta_limit to 3/4 (instead of 1/4) of zfs_arc_max. 3/4 * (11 * 2^30) = 8,858,370,048 But let's use hexadecimal this time. 8,858,370,048 = 0x210000000 kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000

also kstat.zfs.misc.arcstats.c_max is the actual arc max i.e. check after setting the tunable.

ylluminate commented 8 years ago

Interesting, so essentially I should use: kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000 instead of: kstat.zfs.darwin.tunable.zfs_arc_max=8589934592

Or should BOTH be in there? What's the purpose of this new setting in comparison with the old wrt this "meta" vs actual limit?

brendonhumphrey commented 8 years ago

Ah no. Sorry if I wasnt clear. if you have kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000 then kstat.zfs.darwin.tunable.zfs_arc_max=8589934592 wont work.

You can verify what arc max is after setting the tunable by checking kstat.zfs.misc.arcstats.c_max.

Also, the total memory in use is kstat.spl.misc.spl_misc.os_mem_alloc.

ylluminate commented 8 years ago

Alright, so at present I only have kstat.zfs.darwin.tunable.zfs_arc_max=8589934592, so I am removing that and inserting this kstat.zfs.darwin.tunable.zfs_arc_meta_limit=0x210000000 and will monitor the levels. This will be the only line in zsysctl.conf. Will keep you posted.

ylluminate commented 8 years ago

I guess that last setting must have been the wrong move since the kernel_task memory levels are already going well past the threshold very rapidly. Already up to ~20GB after just running rsync for a few minutes.

rottegift commented 8 years ago

Try this script.

DON'T adjust to taste, don't mess around. Tell us the results of running EXACTLY this as superuser as soon as you can after boot time, assuming your recent problematic workload eventually causes problems again.

#!/bin/bash

export PATH=/usr/bin:/bin:/sbin:/usr/sbin

sysctl -w kstat.zfs.darwin.tunable.zfs_arc_min=$((512*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_min=$((256*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_limit=$((5*1000**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_max=$((11*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_dirty_data_max=$((512*1024**2))

IF you see the same problem after running this as superuser near boot-time, then paste in the exact output of

sysctl kstat.spl kstat.vmem.vmem.kmem_va.mem_inuse kstat.vmem.vmem.kmem_default.mem_inuse kstat.vmem.vmem.kmem_oversize.mem_inuse kstat.vmem.vmem.heap.mem_inuse; sysctl vm | egrep '^vm.page_(free_(wanted|count)|speculative|purgable_count)'

(and surround the result in a pair of lines containing three backticks for the sake of readability).

ylluminate commented 8 years ago

So, FYI, I've done the following:

Created /etc/zfs/zsysctl.sh # with 'sudo chmod +x' with these contents:

#!/bin/bash

export PATH=/usr/bin:/bin:/sbin:/usr/sbin

sysctl -w kstat.zfs.darwin.tunable.zfs_arc_min=$((512*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_min=$((256*1024**2+1*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_meta_limit=$((5*1000**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_arc_max=$((11*1024**3))
sysctl -w kstat.zfs.darwin.tunable.zfs_dirty_data_max=$((512*1024**2))

And created /Library/LaunchDaemons/com.openzfsonosx.zsysctlupdate.plist with contents:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>Label</key>
        <string>com.openzfsonosx.zsysctlupdate</string>
        <key>ProgramArguments</key>
        <array>
            <string>/etc/zfs/zsysctl.sh</string>
            <string>/etc/zfs</string>
        </array>
        <key>RunAtLoad</key>
        <true/>
        <key>KeepAlive</key>
        <false/>
    </dict>
</plist>

rottegift commented 8 years ago

Well, you could just have run it by hand (sudo scriptname) when you get a login prompt, but it should (should!) work when executed early by launchd.

Make sure it does run via launchd; you can use the StandardErrorPath and StandardOutPath keys in your plist, and also grep SPL /var/log/system.log and check the output of "sysctl kstat.zfs.darwin.tunable" to make sure the values reflect the script.

Unfortunately if you're using launchd you may have a race with org.openzfsonosx.zconfigd.

(It's perfectly fine to run it by hand before there are more than 2GB in your ARC, and certainly before you start running your problem-reproducing workload. Running it by hand gives you confidence that the tunable values have been updated by the script.)

ylluminate commented 8 years ago

Yes, understood. So far it's staying a lot more sane...

rottegift commented 8 years ago

If it's making a difference this soon with a workload like "6 rsyncs going full bore" -- and on the assumption that each rsync is writing to a different dataset or zvol -- then the issue is probably that you're generating an enromous amount of dirty data (limited by the smallest of 80% of system memory; or ~10% maximum per active target dataset; or 4 GiB per target dataset or zvol) which takes time to drain, during which your reads into ARC will stall, and during which the ARC cannot shrink back based on userland pressure.

The "smallarc" script I gave you leads to a maximum of 512 MiB of dirty data in the ARC per target dataset or zvol summing to a maximum of about 10GiB total. You can even consider dialling down kstat.zfs.darwin.tunable.zfs_dirty_data_max even further (e.g. to 128 MiB) if your write loads will always be throughput-sensitive rather than latency-sensitive.

(However, if you have only one target zvol (or dataset) for all six rsyncs, it could be something else is going on in addition to huge amounts of dirty data.)

openzfsonosx / spl

SPL: MMT it's been an hour since the last reap #12