ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.2k stars 3.03k forks source link

Consider Go's new io/fs filesystem interfaces for IPFS #7556

Open mvdan opened 4 years ago

mvdan commented 4 years ago

See https://go.googlesource.com/proposal/+/master/design/draft-iofs.md; note that it's just a draft still, and it has a Youtube video presentation as well as a Reddit discussion thread.

It's early days for this idea, but if it does get proposed and accepted, it seems like a great fit for IPFS. Pretty much any package that expects to work with a filesystem could magically be composed on top of IPFS with minimal effort, in theory.

/cc @mikeal @warpfork

willscott commented 4 years ago

A couple initial thoughts:

mvdan commented 4 years ago

I should also note that, if the IPFS team has any input for the current design, now is the time to give it :) Reddit is probably the right place right now. Once a proposal is filed, then there will be a GitHub thread.

warpfork commented 4 years ago

This is really exciting. Golang needs a shared standard for filesystem interfaces quite badly; I've been sore about the lack on several occasions, even well outside of IPFS. Thrilled to see talk of it happening.

The whole proposal reads really well to me. Functions defining most operations, and extensional interfaces making it happen "fast" whenever possible, is definitely the way to go for something like this.

The discussion of ensuring the new 'fs' package would not depend on the 'os' package is... yes, please, thank you, oh my goodness.

I'm... not a big fan of os.FileMode or os.FileInfo -- they seem to me to contain a fair number of loaded footguns, simultaneously containing less info than I want, and more oddness in unpacking what's there than seems reasonable, and on even the gripping hand the extensibility that one gets by probing into Sys() is quite filled with sharp edges -- so I'm not thrilled to hear those are escaping rethinks. But I'd have to think very carefully about proposing alternatives, admittedly. And it won't surprise me if the desire of the Go team is to minimize the change in this area.

Similarly: it looks like flag int will be escaping a rethink, and I might be a tad bummed about that. I often wish there was some more structure and type info used around that concept. (For example: combined with the fact that perm os.FileMode is the next argument in functions, and that's also just a typedef of int (such that one can freely write 0777), I find myself often able to make the misstep of switching the two numbers around... and while I'll certainly detect that real fast at runtime, it frustrates me that it's not caught at compile time.)

The discussion about "middleware" such as a "CachingFS", which is mentioned in the section about "archive/zip", is very pleasing.

It's good that "archive/zip" would be fitted to this new FS interface. I'm not sure I'm happy to hear that "archive/tar" might not get similar modification to conform to the new FS interface; I'm familiar with the seekability troubles, and amenable to the concern about setting a performance trap, but at the same time, this is just such a common thing to want from the tar package that it seems people would also trip and stumble in surprise at its absence at a very high rate. I think implementing the FS interface for tar with a simple in-memory index of file start positions, which is lazily populated as Open calls are used, and uses seekable readers for the underlining date if possible... while simply making clear remarks in the documentation about the performance and practicality of this... would be both high value and not all that hard.

It's interesting that path resolution seems to be left as extremely opaque. I see a couple remarks about slash-delimited, but not much in the way rules for lawful relationship between using a path on the FS as a whole vs stepping individual directories one by one.

I didn't see any discussion of symlinks or mention of "Lstat". I find Lstat pokes a lot of very interesting holes in things, so I hope there's more discussion of that upcoming. (Maybe it will simply be handled in a puff of extensional interface smoke with no further ado; but that's not obvious to me.)

Most of the questions I might have for this system are oriented around the writer side of things, which is not yet much treated. I like the outline of the idea that it is all handled by just adding more and more extension interfaces... but I think that needs quite a bit more fleshing out. (Maybe there's more in the example prototype implementation; I haven't moved beyond the doc yet.)

In particular, when poking about filesystem APIs before myself, I've found that the distinction between operating with paths vs operating on open handles (aka 'fd's) can be fairly subtle and fairly critical. Defining operations in terms of paths seems to be the easier and more convenient of the two; but defining operations in terms of open handles seems to be the much more powerful of the two, because it can operate with considerably more clarity and fewer potential races in the case of other concurrent actors on a filesystem. I realize this is intentionally left out of the current draft -- and also, my concerns here probably go a bit further than what might be relevant to IPFS, even -- and on top of all that, the concurrency semantics I'm talking about lean towards being something of a detail of OS kernels and filesystem drivers... but despite all that, I still think the topic of handle-based operations may be worth some consideration sooner than later, because this is a distinction that might be Tricky to retrofit into supportability at a later date if it does turn out to be desired.

I'd also have some questions about if separate extension interfaces for features like chmod and chown will work well. A number of operations like that have varied behavior depending on the order in which they're called -- e.g. using chmod to add a setuid bit, then using chown, has a very different final result on an (e.g.) linux kernel than doing those operations the other way around. I'm not sure it's the FS interface's job to be aware of that... but I'm also not sure future surprises and headaches will be minimized by ignoring this, either? Dunno. Just something to think about.

I'd also love to hear more details about any changes to error handling. My experiences with trying to handle the finer details of errors around filesystems, in an OS-agnostic way, are filled with memories of much bewilderment. The mixture of interface-errors vs check-this-with-a-function-errors vs concrete-type-errors vs exact-value-errors vs value-in-a-struct-errors (some of which are syscall-leaks-through(!)-errors) is... really quite difficult to navigate confidently, at present. (An exercise that may help visualize how wild this can get: If I wanted to use switch statements for error handling, and were to make a separate switch for each kind of switch needed, imagine many of them would be needed!) I know this is hard to sort out (desire to be OS-agnostic whilst not hiding or destroying OS-specific info is certainly tricky!), but a more detailed strategy for making this more parsimonious in the future would be nice to hear more about.

I share these thoughts here, but I'm not sure if they're of sufficient novelty to be worth reposting to the reddit discussions. And I don't have a reddit account, so I'll probably wait until getting a pretty sharp kick in the shins if someone thinks I've said something worth adding there. (Repost snippets freely though, if anyone has a lower coefficient of friction to reddit than I do.)

djdv commented 4 years ago

Chiming in here, mostly just saying "same" and "cool". But also referencing some of the IPFS ipfs mount things I've been working on since they're going to be (or should be I guess) tied to this somehow. I figure people interested in this issue are probably going to be interested in that effort as well. Personally, I'm most interested in some of the meta aspects around the file system APIs such as metadata, portability concerns, API conflicts, developer coordination, etc. and write about some of that in this wall of text. (Obligatory: "pardon my English and verbosity")


re: @warpfork:

I'm... not a big fan of os.FileMode or os.FileInfo -- they ...

+1 on all this. Including the lack of being able to propose an alternative. The issue of coming up with a cross-platform compatible, metadata interface and/or format, that works for everyone... seems tough.

Similarly: it looks like flag int ...

Another +1 with another "This seems just as tough to get right" remark for the same reasons.

The discussion about "middleware" such as a "CachingFS", which is mentioned in the section about "archive/zip", is very pleasing.

Big +1 on this. Reminds me of the nice things I hear about 9P and composing file systems from sets of other file systems. With Go itself having similar composition patterns, it's fitting to see it talked about in the file system discussions.

I'd also have some questions about if separate extension interfaces for features like chmod and chown will work well. ...

I've had things like this on my mind recently. Conventional file systems seem to have a lot more integral responsibilities around system metadata (like permissions, identity, links, additional streams, etc.) than they do with primary file data. I understand why it's this way, but at the same time I question if it's possible to divide them up in a nicer way. Such that the metadata itself and the interfaces for them are mostly external to basic file operations. This ties back into the convention problems mentioned above, where it seems like it will be hard to come up with a "1 interface fits all" solution, but is also something people can define and refine outside of a particular/standard interface.

You can imagine people conforming to a chmod interface with some kind of backing mode-store, or passing through to a host system, etc. But allowing the implementations of general concepts like "permissions" to hopefully just be abstracted enough to be swapped. So you can define basic File logic and just tack on permissions to it if need be. In a way that hopefully just compliments the rest of the API(s).

Likewise, even more abstractions via wrapping those operations. Packages that offer guarantees about operation details such as order, what happens when encountering an error, etc. Such that you're writing a line of code that translate into a batch-transaction rather than individual lines of single operations. (I'm imagining documentation that reads like: "myfs.ChangeMetaXinterface takes in an owner, a mode, as well a a File to associate them with, attempting to change the owner first, followed by changing the mode bits. And attempts to undo the operation upon encountering an error" or something like that.)

Those are my random thoughts on it. Curious as well what problems will arise outside of the basic File interfaces and how people will solve them.

I'd also love to hear more details about any changes to error handling.

Another big +1

In trying to deal with this myself, I've had a bit of a hard time coming up with something elegant. Trying to inspect different error standards from different external packages, and translating them into something uniform, while adding or retaining context. I ended up refactoring how this was handled a few times. The pattern I'm currently using, and have liked the most so far, tries to take ideas from rob's talk about upspin and things I've read from Dave Cheney. I felt lucky stumbling upon this specifically: https://commandcenter.blogspot.com/2017/12/error-handling-in-upspin.html And found this helpful: https://dave.cheney.net/2016/04/27/dont-just-check-errors-handle-them-gracefully

There's a lot of duplication up front in my specific case. Translating IPFS "not found" errors into POSIX "not found" values, wrapped with a generic human-oriented message of "not found", but I found (heh) that once things were defined/established in an 'errors' package, it made it easy to make changes to packages at any layer any time I found a new inconsistency in POSIX needed to produce and/or handle a variety of different error values in context dependent ways.

Here's some relevant links of my WIP branch which maps a lot of the IPFS APIs to other file system APIs like FUSE and 9P. (*These are not meant to be elegant or good examples, just what I currently have after trying to simplify error handling across a bunch of layers) The Error interface and some typed constructors for them ; An Error implementation and raw values they use ; example of generation ; example of inspection ; and the function that inspects it

The fact that error handling is a general thing for Go and not really tied to this FS interface specifically, feels both good and bad. It's nice that we can have elegant ways to deal with them across otherwise disjointed layers, but there's no obvious good way to reduce the amount of (external) coordination required in this context. That is to say, developers always have to coordinate on what error values/types are to be expected, and from what operations/packages. I'm curious what will come of this. Will people rely on os errors, will fs define its own set of standard errors, will the community? Curious about all this. Nothing concerning comes to mind though. The fact that any is possible is pretty nice.


error handling stuff: I'm gonna cross reference these posts to the mount branch too because it seems relevant to the people interested in this issue. I'm also eager get any and all feedback related to this and that. If I can conform what's there now to better fit what will become a Go standard, I'd like to do that, and discuss design decisions around these topics. Anyone, feel free to reach out to me about it. https://github.com/ipfs/go-ipfs/issues/7575#issuecomment-669331633 https://github.com/ipfs/go-ipfs/issues/7575#issuecomment-670061659

Here's some relevant sections of the branch: the current actual file system interface I conform to ; an implementation of it ; a FUSE wrapper interface that uses it


Random semi-relevant opinions: In general I'm pretty pleased with this initial direction. While I'm excited to see this effort, not much of what's being talked about for the initial phases seem all that exciting itself (which I consider to be a good thing; boring and simple systems are usually good systems). All the discussion around this leads me to believe that we're going to get something from/in Go that sets a good foundation for people to write and integrate with APIs like IPFS in more modular and flexible ways. And that's the exciting bit for me. The extensibility aspects are very appealing to theorize about as well, but I am curious how things will go in practice.

The selection of responsibility and pace is also nice to see, as expected from the Go community. I appreciate people not trying to be hasty about this, and trying to break it up into phases, or even remove responsibility from std altogether. Pushing things out of the standard and into the community is usually something that frustrates me in other languages, but not in Go (because of its core tenets and standards; both from a language perspective but also a tooling one). I see no issues with trying to make composeable file systems, comprised of multiple de-facto standards from both the std and 3rd party packages in Go, while I can't say the same would be as appealing in other languages. I don't think it will be easy or quick for people to solve a bunch of concerns but they also don't seem like impossible feats. Yay!

KempWatson commented 4 years ago

I'm far from familiar with the lower level details here, but on reading the proposal, it seems to me that the approach of leaving the "old way" intact for compatibility, when the new way can do the whole job, is exactly why languages and libraries get filled with bloaty cruft. I get the goal of backwards compatibility, but this screams "deprecation needed" to me. And it looks like the changes needed are pretty much cut-and-paste, not larger code rewrites on the user side - doesn't seem like a lot to ask to keep the stdlib clean.

djdv commented 3 years ago

For what it's worth I have read-only implementations of fs.FS for some of the IPFS APIs here: https://github.com/djdv/go-filesystem-utils/tree/e7f0e70f932e67d465aee6655602077124176a94/filesystem I'm planning on cleaning them up (as well as properly writing tests) and extending them to cover writable operations later. They currently seem to work as expected when used under a Fuse wrapper in the same directory (cgofuse), and I plan to use them to expose a 9P host as well. Posting in case anyone in the issue is interested in that.

JohnStarich commented 2 years ago

This would be fantastic. I've written about and open sourced hackpadfs to share the full set of FS operations as io/fs-like interfaces and a rigorous test suite. Based on the above comments, these additional interfaces may be useful here.

I'd love to integrate an IPFS adapter into hackpad.com's runtime at some point. Maybe it could enable folks to share their playground code with a shortlink, which loads their distributed source files.