Closed MikaelSmith closed 4 years ago
We have some open questions to investigate around how editing a file works:
To investigate this I’d suggest stubbing the rest of the file-related FUSE callbacks in https://godoc.org/bazil.org/fuse/fs (HandleFlusher, HandleWriter, NodeSetattrer, maybe NodeFsyncer) and watching how they get called when you try to edit a file.
The first pass at this could implement just Write with offset 0. We could probably detect when the file's opened to write|truncate. This seems to work for some common editors.
More complicated scenarios later would need
This suggests two sets of API, one for plugins that can only upload whole contents, another for plugins that can write discrete blocks and change the size of a file.
So the first piece of work is a full-file write:
Write(context.Context, []byte) error
I don't have an argument for including the amount written. Either it wrote the file, it failed destructively (the file was overwritten or deleted and does not have the expected content), or it failed non-destructively (original contents is still in place). Most APIs don't have destructive failure modes, either they update the file or they fail without modifying the file; I think all plugins should strive for only non-destructive failures.
This interface would only be used if the FUSE operations would result in replacing the file (open write-only, truncate). Any other modes would error until further work is done.
I spent some time thinking about this too, and that thinking reduced to the following:
Read
should be generalized to "we can read data from the entry". That data may or may not be the entry's content. Reason for the generalization is because things like cloud function logs don't really have content, we're just creating it based on what we think makes sense (that being to mirror gcloud functions log read <function_name>
.
Write
should be generalized to "we can write data to the entry." That data may or may not be the entry's content. Reason for that is b/c things like topics can implement Write, but topics don't have content; they have messages.
We shouldn't always assume that Read
/Write
are related (and by that, I mean we shouldn't assume that things that implement both Read
/Write
can be assumed to be files. That way, we can let plugin authors freely implement Read
for things like GCP topics).
I agree with you that Write
should start off as Write(ctx, data) (n, err)
(I think returning the # of bytes written is useful, it matches what people expect of write
behavior, it is easy enough to do and you never know when it might be useful).
Write
's semantics will probably be entry-specific. With the existing Write
stuff for topics, the the overwrite behavior you're talking about, and the fact that we don't want to assume Read
/Write
are related, I think we'll need to give plugin authors a way to distinguish "file" entries from other entries. We could try using the Size
attribute for that, i.e. "If this thing has a size attribute, then it is a file" though one could also argue that maybe Size
should just mean "we know the readable data's size". Either way, once we distinguish "file" entries, then we can say that "Write"'s semantics are "Overwriting the file"; otherwise, "Write"'s semantics are entry-specific and you should check that via the docs
command.
Implementing this is a little tricky because writing to a file can involve several independent system calls.
When appending to a file with echo something >> file
, we produce
Dec 10 11:50:20.571 FUSE: Attr /local/foo: valid=1s ino=0 size=13 mode=-rw-r--r--
Dec 10 11:50:20.571 FUSE: Open /local/foo[0xc0066fe650]: {Header:{Conn:0xc00047b620 ID:0x16 Node:0x7 Uid:501 Gid:20 Pid:99031 msg:0xc00c7902a0} Dir:false Flags:OpenWriteOnly}
Dec 10 11:50:20.571 FUSE: Open /local/foo[0xc0066fe650]: {Header:{Conn:0xc00047b620 ID:0x1b Node:0x7 Uid:501 Gid:20 Pid:99031 msg:0xc00054e2a0} Dir:false Flags:OpenReadOnly}
Dec 10 11:50:20.573 FUSE: Read 13/23 bytes starting at 0 from /local/foo: <nil>
Dec 10 11:50:20.573 FUSE: Read 0/10 bytes starting at 13 from /local/foo: <nil>
Dec 10 11:50:20.573 FUSE: Write 23/23 bytes starting at 0 from /local/foo: <nil>
Dec 10 11:50:20.573 FUSE: Attr /local/foo: valid=1s ino=0 size=13 mode=-rw-r--r--
Dec 10 11:50:20.573 FUSE: Flush {0xc000376d20 0xc000376d20 0xc000376d30 /local/foo 0xc0076a991c 0xc0067641e0}: {Header:{Conn:0xc00047b620 ID:0x3 Node:0x7 Uid:501 Gid:20 Pid:99031 msg:0xc00c790450} Handle:0x1 Flags:0 LockOwner:0}
Dec 10 11:50:20.574 FUSE: Release /local/foo: {Header:{Conn:0xc00047b620 ID:0x1d Node:0x7 Uid:501 Gid:20 Pid:99031 msg:0xc00054e030} Dir:false Handle:0x1 Flags:OpenWriteOnly ReleaseFlags:0 LockOwner:0}
Dec 10 11:50:20.575 FUSE: Attr /local/foo: valid=1s ino=0 size=23 mode=-rw-r--r--
Dec 10 11:50:20.575 FUSE: Release /local/foo: {Header:{Conn:0xc00047b620 ID:0x4 Node:0x7 Uid:501 Gid:20 Pid:99031 msg:0xc00054e2a0} Dir:false Handle:0x2 Flags:OpenReadOnly ReleaseFlags:0 LockOwner:0}
When clobbering a file with echo something > file
Dec 10 11:52:53.147 FUSE: Attr /local/foo: valid=1s ino=0 size=23 mode=-rw-r--r--
Dec 10 11:52:53.148 FUSE: Open /local/foo[0xc00036a650]: {Header:{Conn:0xc00047b620 ID:0x18 Node:0x8 Uid:501 Gid:20 Pid:99031 msg:0xc00054e030} Dir:false Flags:OpenWriteOnly}
Dec 10 11:52:53.148 FUSE: Setattr /local/foo[0xc00036a650]: {Header:{Conn:0xc00047b620 ID:0x17 Node:0x8 Uid:501 Gid:20 Pid:99031 msg:0xc00c790360} Valid:SetattrSize+SetattrHandle Handle:0x2 Size:0 Atime:2105-07-13 21:49:35 -0800 PST Mtime:1969-12-31 16:00:00.00262964 -0800 PST Mode:D--------- Uid:0 Gid:2155039296 Bkuptime:1969-12-31 16:00:00 -0800 PST Chgtime:1969-12-31 16:00:00.000000002 -0800 PST Crtime:0001-01-01 00:00:00 +0000 UTC Flags:3787}
Dec 10 11:52:53.148 FUSE: Attr /local/foo: valid=1s ino=0 size=0 mode=-rw-r--r--
Dec 10 11:52:53.148 FUSE: Attr /local/foo: valid=1s ino=0 size=0 mode=-rw-r--r--
Dec 10 11:52:53.148 FUSE: Write 10/10 bytes starting at 0 from /local/foo: <nil>
Dec 10 11:52:53.148 FUSE: Attr /local/foo: valid=1s ino=0 size=0 mode=-rw-r--r--
Dec 10 11:52:53.148 FUSE: Flush {0xc0067a6c40 0xc0067a6c40 0xc0067a6c50 /local/foo 0xc00677973c 0xc0067c19e0}: {Header:{Conn:0xc00047b620 ID:0x1b Node:0x8 Uid:501 Gid:20 Pid:99031 msg:0xc00c790360} Handle:0x2 Flags:0 LockOwner:0}
Dec 10 11:52:53.149 FUSE: Release /local/foo: {Header:{Conn:0xc00047b620 ID:0xc Node:0x8 Uid:501 Gid:20 Pid:99031 msg:0xc00c790540} Dir:false Handle:0x2 Flags:OpenWriteOnly ReleaseFlags:0 LockOwner:0}
Dec 10 11:52:53.149 FUSE: Attr /local/foo: valid=1s ino=0 size=10 mode=-rw-r--r--
When updating a file with vim
Dec 10 11:53:59.496 FUSE: Open /local/foo[0xc00036a650]: {Header:{Conn:0xc00047b620 ID:0x7 Node:0x8 Uid:501 Gid:20 Pid:99497 msg:0xc00c790540} Dir:false Flags:OpenReadOnly}
Dec 10 11:53:59.497 FUSE: Read 10/10 bytes starting at 0 from /local/foo: <nil>
Dec 10 11:53:59.498 FUSE: Flush {0xc0067a6c40 0xc0067a6c40 0xc0067a6c50 /local/foo 0xc00677973c 0xc006bf15f0}: {Header:{Conn:0xc00047b620 ID:0x15 Node:0x8 Uid:501 Gid:20 Pid:99497 msg:0xc00c790540} Handle:0x2 Flags:0 LockOwner:0}
Dec 10 11:53:59.498 FUSE: Release /local/foo: {Header:{Conn:0xc00047b620 ID:0xb Node:0x8 Uid:501 Gid:20 Pid:99497 msg:0xc00c790450} Dir:false Handle:0x2 Flags:OpenReadOnly ReleaseFlags:0 LockOwner:0}
Dec 10 11:53:59.508 FUSE: Attr /local/foo: valid=1s ino=0 size=10 mode=-rw-r--r--
Dec 10 11:54:20.769 FUSE: Open /local/foo[0xc006935440]: {Header:{Conn:0xc00047b620 ID:0xd Node:0x9 Uid:501 Gid:20 Pid:99497 msg:0xc00c790450} Dir:false Flags:OpenWriteOnly}
Dec 10 11:54:20.769 FUSE: Setattr /local/foo[0xc006935440]: {Header:{Conn:0xc00047b620 ID:0xf Node:0x9 Uid:501 Gid:20 Pid:99497 msg:0xc00c790210} Valid:SetattrSize+SetattrHandle Handle:0x2 Size:0 Atime:4692-01-16 01:33:41.001835524 -0800 PST Mtime:1970-01-01 01:13:04.00262964 -0800 PST Mode:Dur--rwx--x Uid:892415341 Gid:808465975 Bkuptime:922601624-11-05 12:44:32 -0800 PST Chgtime:402926-10-09 23:53:36.000000004 -0800 PST Crtime:0001-01-01 00:00:00 +0000 UTC Flags:2735}
Dec 10 11:54:20.769 FUSE: Attr /local/foo: valid=1s ino=0 size=0 mode=-rw-r--r--
Dec 10 11:54:20.769 FUSE: Write 8/8 bytes starting at 0 from /local/foo: <nil>
Dec 10 11:54:20.770 FUSE: Fsync /local/foo[0xc006935440]: {Header:{Conn:0xc00047b620 ID:0x10 Node:0x9 Uid:501 Gid:20 Pid:99497 msg:0xc00c790450} Handle:0x2 Flags:1 Dir:false}
Dec 10 11:54:20.770 FUSE: Attr /local/foo: valid=1s ino=0 size=0 mode=-rw-r--r--
Dec 10 11:54:20.770 FUSE: Flush {0xc0069436c0 0xc0069436c0 0xc0069436d0 /local/foo 0xc0069db34c 0xc0066fd320}: {Header:{Conn:0xc00047b620 ID:0x14 Node:0x9 Uid:501 Gid:20 Pid:99497 msg:0xc00c790450} Handle:0x2 Flags:0 LockOwner:0}
Dec 10 11:54:20.770 FUSE: Release /local/foo: {Header:{Conn:0xc00047b620 ID:0x13 Node:0x9 Uid:501 Gid:20 Pid:99497 msg:0xc00c790210} Dir:false Handle:0x2 Flags:OpenWriteOnly ReleaseFlags:0 LockOwner:0}
Dec 10 11:54:20.771 FUSE: Attr /local/foo: valid=1s ino=0 size=8 mode=-rw-r--r--
Dec 10 11:54:20.771 FUSE: Setattr /local/foo[0xc006935440]: {Header:{Conn:0xc00047b620 ID:0x19 Node:0x9 Uid:501 Gid:20 Pid:99497 msg:0xc00c790450} Valid:SetattrMode Handle:0x0 Size:0 Atime:1969-12-31 16:00:00 -0800 PST Mtime:1969-12-31 17:08:16 -0800 PST Mode:-rw-r--r-- Uid:0 Gid:0 Bkuptime:1969-12-31 16:00:00.000000001 -0800 PST Chgtime:2106-02-06 22:28:16.005195282 -0800 PST Crtime:0001-01-01 00:00:00 +0000 UTC Flags:0}
Dec 10 11:54:20.772 FUSE: Attr /local/foo: valid=1s ino=0 size=8 mode=-rw-r--r--
The important ones to pick out are that Attr
is used to check the file size, and Setattr
is used to change it (usually truncate to size 0). Reading and writing are done with independent handles produced by separate calls to Open
. With a larger file
Dec 10 11:57:08.644 FUSE: Write 4096/4096 bytes starting at 0 from /local/foo: <nil>
Dec 10 11:57:08.644 FUSE: Write 4004/4004 bytes starting at 4096 from /local/foo: <nil>
we get multiple writes. The result of those writes is expected to grow the size, but writes don't need to complete until the call to Flush
.
Attr
/Setattr
and Write
/Flush
happen on different objects that all need to be in-sync. Any implementation will need to recognize when changes are happening and avoid overwriting them with refreshed data from the remote source until those changes are completed by a call to the plugin's Write
implementation.
I don't think it makes sense to have Write return an int for bytes written. We have no way to surface that programatically because we'll be calling the actual Write during a Flush operation. So it would only be logged, and given that plugins shouldn't have implementations that preserve a partial write I don't think that would be useful.
My main concern is that it may not make sense now, but it could later in ways that we might not see (b/c not returning the bytes written is a bit unexpected of a write primitive [I could be wrong on that too]). If that happens, it may be a bit painful to change things around to return the bytes without breaking what we already have (thinking external plugins here). If it's very likely to never make sense, then am OK ignoring the # of bytes written.
I'm having trouble coming up with any. I'd rather preserve the simplicity here. Whenever we add Block Write, I expect it would more closely mirror the io.Writer
interface and if you had a use for returning partial writes that would actually be able to do something about it.
Ok, sounds good.
We added a
Write
primitive to Wash to implement sending a message to a message queue. We'd like to generalize that to be able to make changes to any remote file.We have a couple use cases in mind for writing:
The file lifecycle has several points that could be useful
We have several things we expect to work:
echo 'some text' >> file
appends to the file. If that requires rewriting the file, it should do so with a single upload.echo 'some text' > file
overwrites the file.Implement support for replacing a file. Plugins implement the
Writable
interface asWash then aggregates all file updates, and calls
Write
when changes are flushed to the filesystem.External plugins: