winfsp / cgofuse

Cross-platform FUSE library for Go - Works on Windows, macOS, Linux, FreeBSD, NetBSD, OpenBSD
https://winfsp.dev
MIT License
514 stars 82 forks source link

Can't set nonseekable for open files #5

Closed ncw closed 7 years ago

ncw commented 7 years ago

In Open I'd like to be able to set the nonseekable flag.

In rclone seeking is impossible on files that are open for write and difficult on files open for read (so the users may turn it off).

I can detect this in the Read/Write however I'm used to getting fuse to do this for me.

I guess some of the other flags in file_info might be useful too.

That said, if this interferes with winfsp, it isn't a big deal!

billziss-gh commented 7 years ago

Perhaps I oversimplified the Open method by not including any of the fuse_file_info flags.

There are two issues:

Let me give it some thought.

billziss-gh commented 7 years ago

From WinFsp fuse_intf.c:

    /*
     * Ignore fuse_file_info::direct_io, fuse_file_info::keep_cache
     * WinFsp does not currently support disabling the cache manager
     * for an individual file although it should not be hard to add
     * if required.
     *
     * Ignore fuse_file_info::nonseekable.
     */

[The first part of the comment is actually wrong now, because WinFsp currently supports selectively disabling the cache manager.]

So unfortunately nonseekable is not supported on WinFsp.

ncw commented 7 years ago

I'll just make sure that seeking returns an EIO which is probably all that libfuse does, though I guess it might do some magic in the kernel.

billziss-gh commented 7 years ago

If I get some free time later today, I will investigate the specific behavior.

billziss-gh commented 7 years ago

My findings:

So it looks like the nonseekable flag is Linux specific. Returning -ESPIPE if you can detect the out-of-sequence I/O should be sufficient. But I am wondering if that severely cripples the file system (e.g. no memory mapping support, no caching support, etc.)

ncw commented 7 years ago

Thanks for looking that up. I'm not checking this at the moment on OS X so that is a bug which needs fixing in rclone.

I had to look up ESPIPE, but it is "Illegal seek" which makes perfect sense.

But am wondering if that severely cripples the file system (e.g. no memory mapping support, no caching support, etc.)

Yes there are certainly things a fs with no seek can't do, but most application just read and write files sequentially so it is more useful than you might think.

billziss-gh commented 7 years ago

Yes there are certainly things a fs with no seek can't do, but most application just read and write files sequentially so it is more useful than you might think.

I agree.

Going back to the original ask ("can't set nonseekable for open files") it looks like that these flags tend to be os/implementation specific. I therefore suggest that we close this issue with "wontfix". Do you agree?


I thought a bit more about the general problem that you have with rclone and seekability. If I understand correctly the problem is that most cloud storages provide a method to do a ranged GET, but no method to do a ranged PUT.

I think that you mentioned caching files locally in a different thread (or the rclone docs); I believe such a scheme could work nicely. As an example I mention the Andrew File System, which fetches a file on the first open and satisfies all I/O from the local file system. When the file gets closed AFS uploads the file to the server, but only if it has been updated.

Here is a very nice writeup of how AFS works. I believe the AFSv1 protocol might be sufficient for rclone.

[You also have to worry about issues with eventual consistency on cloud storage systems, but I am sure you are already aware of those.]

ncw commented 7 years ago

Going back to the original ask ("can't set nonseekable for open files") it looks like that these flags tend to be os/implementation specific. I therefore suggest that we close this issue with "wontfix". Do you agree?

Yes that sounds fine.

I thought a bit more about the general problem that you have with rclone and seekability. If I understand correctly the problem is that most cloud storages provide a method to do a ranged GET, but no method to do a ranged PUT.

That is correct.

I think that you mentioned caching files locally in a different thread (or the rclone docs); I believe such a scheme could work nicely. As an example I mention the Andrew File System, which fetches a file on the first open and satisfies all I/O from the local file system. When the file gets closed AFS uploads the file to the server, but only if it has been updated.

That is exactly the scheme I've been thinking of. Thanks for the AFS writeup link. That has the additional idea that you'd keep the local file in a cache and check it hadn't changed on open which I hadn't thought of. That would be relatively easy.

[You also have to worry about issues with eventual consistency on cloud storage systems, but I am sure you are already aware of those.]

Yes... File systems on cloud storage systems tend to be a little approximate. The main problem is that the PUT/GET is much less reliable than file systems expect. This isn't a problem for reading - you can retry, but writing is a real problem which is the major desire for caching the whole file for upload.

billziss-gh commented 7 years ago

Going back to the original ask ("can't set nonseekable for open files") it looks like that these flags tend to be os/implementation specific. I therefore suggest that we close this issue with "wontfix". Do you agree?

Yes that sounds fine.

Ok closing this, but will continue the discussion below. Let me know if you believe we should move this discussion elsewhere (like an rclone issue).

[You also have to worry about issues with eventual consistency on cloud storage systems, but I am sure you are already aware of those.]

Yes... File systems on cloud storage systems tend to be a little approximate. The main problem is that the PUT/GET is much less reliable than file systems expect. This isn't a problem for reading - you can retry, but writing is a real problem which is the major desire for caching the whole file for upload.

I am familiar with eventual consistency problems as I have done some cloud file system work myself. The worst I have seen is overwriting an object on cloud storage, only to read it back from the same machine and discover that I am getting back the previous version of the object. It is for this reason that I designed my own secfs as a copy-on-write file system (i.e. objects on storage are written only once and never overwritten; this gets around the eventual consistency issues (cloud objects in secfs are file chunks and never full files themselves)).

In your case you must work with existing objects ("files") on cloud storage and cannot afford to design your own file system structure to avoid consistency issues. So some of the AFSv1 ideas may not work outright. For example, I am not certain that you can implement a version of TestAuth reliably.

You may therefore have to cache files only while they are open (for updating). This way you do not have to issue TestAuth against an unreliable cloud storage.

I am happy to discuss protocol specifics if you want me to (here or elsewhere).