trapexit / mergerfs

a featureful union filesystem
http://spawn.link
Other
4.27k stars 173 forks source link

High IOWait for load balancing rclone mounts #778

Closed mingomingo closed 1 year ago

mingomingo commented 4 years ago

General description

I was trying to use the load balancing which was described here #742. I merged 5 rclone mounts which all have identical files in it.

Problem was the IOWait is extremely high, its on the range of a constant 40% and highs of 80% for a hours.

Note tho I have something like a dozen plex servers scanning from the mergerfs mount. Im not even sure if it was an rclone problem anymore. But I went and checked atop and pressed d and got to see that mergerfs was doing 20%.

Will appreciate the help. Maybe my flags for mergerfs was wrong. Not sure. I also ready you made an inodecalc feature, I don't understand tho how to use it on my usecase.

Expected behavior

Not high iowait

Actual behavior

Super high iowait

Precise steps to reproduce the behavior

Explicitly list all steps to reproduce. Preferably create a minimal example of the problem using standard command line tools. The more variables (apps, settings, etc.) that are involved the more difficult it is to debug. Also, please be sure to have read all of the README. It contains a lot of information regarding known system and user issues.

Mount 5 identical rclone mounts, using the following command

rclone mount GDSA01C: /mnt/fs/acc1-sa1 \
--config /admin/acc/sa1/rclone.conf \
--log-file=/admin/logs/mount/acc1-sa1.log \
--log-level=NOTICE \
--uid=1000 --gid=1000 --umask=002 \
--allow-other \
--timeout=1h \
--tpslimit=9 \
--user-agent rclone/v1.48 \
--dir-cache-time 1h \
--vfs-cache-mode writes \
--vfs-cache-max-age 1h \
--vfs-cache-max-size off \
--vfs-read-chunk-size-limit 512M \
--vfs-read-chunk-size 16M \
--buffer-size 8M

Then use mergerfs with the following command,

mergerfs -o func.open=rand -o async_read=false,auto_cache,dropcacheonclose=true,use_ino,allow_other,func.getattr=newest,category.create=newest,minfreespace=0,fsname=union \
/mnt/fs/acc1-sa1=NC:/mnt/fs/acc1-sa2=NC:/mnt/fs/acc1-sa3=NC:/mnt/fs/acc1-sa4=NC:/mnt/fs/acc1-sa5=NC /mnt/unionfs

When the plex servers starts scanning the mergerfs mount, it would be ok for a few hours (not sure if this is because he plex servers arent yet scanning or the setup works for the first few hours). Then the high IOwait appears and will stay they until a restart of the whole server.

System information

Please provide as much of the following information as possible:

Rclone mounts

trapexit commented 4 years ago

I'm not sure what to say or suggest really. You're using a FUSE fs over several FUSE fs and turn off async reads which means it has to actively order reads. mergerfs only does things it's asked of... so it's using CPU or reading from disk it's because something is asking it to. You have caching on so maybe you're running out of memory? Have you turned it off? Without knowing why mergerfs is busy I can't comment. Have you read the docs on performance and caching?

mingomingo commented 4 years ago

I was hoping to get tips on how I can lower the IOWait on the setup and overall imrprove it.

I have 64Gb ram on the system, so far it isnt being fully utilized.

Would it be better to enable async and cache on mergerfs or rclone?

What do you need to see why mergerfs is busy?

trapexit commented 4 years ago

If there was a known issue wit IOWait it'd be documented and/or addressed. mergerfs is just a proxy. It's ability to scale is based on the number of threads and client behaviors. You can enable certain caching as described in the docs but it's relevance depends on workflow.

As far as I know you can't have async reads due to rclone limitations. That's not something I can really comment on.

Either, both. mergerfs is the access point to rclone so if relevant it'd be better to cache mergerfs rather than rclone.

strace is only so useful. If you know the software involved then you can just see what it's doing. Plex does serious scans and reads of the filesystems and rclone is only going to be so fast so if you have a lot of client apps it's going to lead to high iowait states.

mingomingo commented 4 years ago

Ok, I'm not really sure what I'm doing with it. Tried disabling cache on rclone and enabling async_read on mergerfs. See where that goes.

I just built the idea from the previous discussion about this. Not really sure how to execute it properly and just used a previous mergerfs and rclone setting I had (which what it is now). Came here to see if I can improve the setup. Likely the high IOWait is indeed not a mergerfs problem, but its unique to the use case.

trapexit commented 4 years ago

As I mentioned you should look over the caching and performance section of the docs and see what might be relevant. readdir, entry, and attr caches are pretty standard fare if you're not doing much out of band updates. Turning off page caching to limit any buffer bloat. It's all in the docs. Your best bet is to test. I really can't offer any more advice than what I've documented right now as it just sounds like high latency due to the setup.

trapexit commented 4 years ago

Any updates or should we close this?

mingomingo commented 4 years ago

I think its just the plex servers.

Im just figuring out the cache part of the setup. Currently tried only caching it with mergerfs, but its very unstable if I turn off the cache on rclone. Will do some further tests.