winfsp / sshfs-win

SSHFS For Windows
https://winfsp.dev
Other
5.07k stars 254 forks source link

Performance tuning woes #58

Open OskarAtGitHub opened 6 years ago

OskarAtGitHub commented 6 years ago

Other stuff seems to work. So trying to get the performance up to spec now. Having write issues with single file write performance, and read+write performance issues with multiple files. Client is a windows 7 machine. Trying to tackle the single file issue first. These are screenshots first reading a 900MB file from the server, then immediately after doing a write to the server with a file of the same size. There is a big "discovering items" delay before the write (to server) starts. Also a dip when the server performs a physical write to the disk (probably a server side thing?). With: C:\Program Files\SSHFS-Win\bin>sshfs -o create_umask=007,uid=-1,gid=-1,cache=yes,cache_max_size=2000000,cache_timeout=120,follow_symlinks,compression=no -o ssh_command="/bin/ssh.exe" copy-900mb_file-to-and-from-sshfs-drive--nosshfs_sync And with the above, with also sshfs_sync copy-900mb_file-to-and-from-sshfs-drive--sshfs_sync

billziss-gh commented 6 years ago

@OskarAtGitHub you can try playing with the kernel caching options:

-o FileInfoTimeout=-1       # enable file caching
-o DirInfoTimeout=1000      # but keep dir caching at 1 sec to pick up remote changes
billziss-gh commented 6 years ago

Also how does this performance compare with plain sftp from the command line?

OskarAtGitHub commented 6 years ago

This is with sftp get and put. Get seems to have the same performance, put seems hugely improved. Put is for some reason still slightly slower than get, but there is almost zero lag when starting the "put", contrary to sshfs. copy-900mb_file-to-and-from-sftp

billziss-gh commented 6 years ago

@OskarAtGitHub did you try with the suggested flags above?

OskarAtGitHub commented 6 years ago

sshfs with the kernel options result in much better performance! There is still a huge "discovering items" -lag before the upload though. copy-900mb_file-to-and-from-sshfs-drive-kernel-cache-opts

billziss-gh commented 6 years ago

sshfs with the kernel options result in much better performance!

Good to hear!

Be mindful of cache coherency issues when using full file caching though. For example, if a file is updated remotely, your local client may not "pick up" the changes until the file is closed. So I would not use this for an SSHFS server that has files updated by multiple clients.

There is still a huge "discovering items" -lag before the upload though.

This may be just Explorer doing its thing. Are you able to use FileSpy or Process Explorer and see what takes so long?

OskarAtGitHub commented 6 years ago

This may be just Explorer doing its thing. Are you able to use FileSpy or Process Explorer and see what takes so long?

Learning how to do that might take too long for me for it to make sense to put in the effort. When saving and opening large files from programs directly, it seems to work nicely now. So likely it's some Explorer thing.

We have a really small office type solution with this, and are now able to work directly from the sever files. Hugely reducing manual copy steps, and file "version" issues. Great stuff!

Copying lots of small files still has a big difference between up and down speed. This might also be due to the raid system on the server also. Btrfs raid caching on the server should help with that though? Not sure. Anyway, just for reference, here is a speed plot for about 800MB worth of about 1500 separate files going to the server, and then back. copy-900mb_files_xemsdir-to-and-from-sshfs-drive-kernel-cache-opts

billziss-gh commented 6 years ago

FileSpy is actually pretty simple to use.

But if you do not believe you have the time for it and your problem appears resolved, should we close this?

OskarAtGitHub commented 6 years ago

I'm getting sporadic write errors. Not sure if this is related or not to the performance settings. Every now and then the sftp server log will be full of these:


Sep 18 13:17:42 serverXYZ sftp-server[18273]: debug1: request 207641: write "/AAA/BBB/CCC/DDD/XYZ.XYZ" (handle 0) off 1204224 len 4096
Sep 18 13:17:42 serverXYZ sftp-server[18273]: error: process_write: write failed
Sep 18 13:17:42 serverXYZ sftp-server[18273]: sent status No such file

These continue every few seconds, even though the program that would be trying to write to the file has been closed on the client machine. (EDIT: the file it's trying to operate on does actually exist, and client has correct rw privileges.) Killing the server process, and logging in/mounting again from the client solves it. Until it at some random time will happen again on some other file.

billziss-gh commented 6 years ago

If you have kernel caching on, it is normal to get writes on a "closed" file. This is because of how write caching works: writes are accepted immediately, but are cached in system memory; later on the "lazy writer" wakes up and sends this writes to the underlying medium. For this to work correctly the kernel does not really close the files when the application asks it to, but keeps them around until the lazy writer has done its job.

I am not certain why the SSHFS server fails those requests. I have not tested SSHFS with kernel caching enabled.

OskarAtGitHub commented 6 years ago

Some further trial and error with this strange file problem. This is maybe getting off topic for this performance tuning thread. But not sure if there is some relation or not.

I think the main cause of this is unrelated to the kernel caching. But is just more noticeable. I had strange write and file rename issues before these performance tuning options. But not as frequently or as "permanent" of a problem. With kernel caching enabled, once it gets confused about a file, the only way to recover is to close and restart the sshfs "mapping".

I think (not 100% sure) that the issue is that some part on the client gets the file id's mixed up. Or however that works under the hood. And ends up running into a write permission problem on the server.

So sometimes this happens: 1: User on client machine creates a new file. create_umask=007, but file is 700 or 750 (660 would be ideal for generic files, 770 would be a good compromise, since some are executable) 2: User on client machine, during the same session, will try to overwrite the file. Sometimes this will result in a write error. When this happens the sftp-server log starts filling up with new "write failed" every 1-2s. 3: Client kills the sshfs connection, and reconnects. Windows will still not be able to rename, delete or write to the file in question. User has rwx rights to the file on the server. In some cases user succeeds in deleting the file from the server, but can not rename a different file to the name of the deleted file.

I'm guessing the user mapping isn't get mixed up, because otherwise the user couldn't write or read any files. This only happens seemingly at random for one file at a time. So the only possibility would be that the file reference is somehow getting mixed up? Maybe?

Is there some way for me to provide more details on what is happening to help with debugging?

Extra debugging info: Thunderbird is not able to save files to the sshfs drive (with or without kernel caching). It creates some file, and tries to rename it or something, and does not complain of any write error. But the end result is that whatever it's trying to do results in no file on the server (probably deletes the intermediate step files, but won't create the end result file, or the end result file is deleted in the process). All other programs I've tested seem to work. But have these random write error things.

er1z commented 6 years ago

Has anyone tried to use this one? https://github.com/rapier1/openssh-portable