IPFS mount write support

ipfs / kubo

An IPFS implementation in Go

https://docs.ipfs.tech/how-to/command-line-quick-start/

Other

15.9k stars 2.98k forks source link

IPFS mount write support #5504

Open renne opened 5 years ago

renne commented 5 years ago

Version information:

go-ipfs version: 0.4.17- Repo version: 7 System version: amd64/linux Golang version: go1.10.3

Type:

Enhancement

Description:

Using the ipfs add -r and ipfs name publish commands and copy and paste hashes around is quite inconvenient.

So I suggest to make IPFS mounts writable by allowing to directly create directories and files and update them in /ipns/<hash-id>/.

overbool commented 5 years ago

@renne maybe you can try ipfs add -r -q <dir> | tail -n1 | ipfs name publish

renne commented 5 years ago

@overbool An uppercase Q saves the tail command: ipfs add -r -Q <dir> | ipfs name publish

It doesn't solve the additional manual operation when e.g. sharing the directory via Samba. It is not possible to run ipfs add while /ipfs is mounted read-only.

djdv commented 5 years ago

I am working on an implementation of mount that should be more flexible. https://github.com/ipfs/go-ipfs/issues/5003 The current implementation has read only support for IPFS, IPNS, and MFS. But the goal is to have writable IPNS for keys you own, and writable MFS, at least.

renne commented 5 years ago

I'm using Linux. So this should be platform independent.

djdv commented 5 years ago

I'm using Linux. So this should be platform independent.

We've decided to target cgo-fuse which currently covers macOS, FreeBSD, NetBSD, OpenBSD, Linux, and Windows.

renne commented 5 years ago

I always prefer OS-independent implementations. A drawback I see with cgo-fuse is the lack of go-toolchains for embedded devices (e.g. OpenWRT).

bam80 commented 3 years ago

Was it progressed since then in any way?

djdv commented 3 years ago

@bam80 tl;dr Not in any meaningful way. :^(

There were some coordination issues which stalled things for a while, and then the requirements being requested kept/keep changing, which is pushing us farther from this rather than closer. Originally the task was to make a go-ipfs plugin which added this support, and that worked well enough to demo but it was later requested that this work be a direct component of go-ipfs. (That change required a lot of cleanup to code that hasn't been touched since some of the initial releases) Now it's being asked to move this effort into a standalone project.

Breaking out of go-ipfs took some effort itself, but a lot is going to have to be (re)written. Before the writable portions can be reimplemented, we'll need to implement some APIs to replace the functionality that was provided by the ones I was using before (when I had direct access to the IpfsNode struct), with the addition of working remotely (like how the CoreAPI works). This is complicated in its own right but will also require coordination with the project too. Considering the type of requests we're making it may be unreasonable to rely on the existing HTTP transports too, meaning implementing the CoreAPI over another (local) IPC. (This remains to be seen, HTTP might be okay, but before we were just doing everything in the same process space)

I shared a status update via IRC and plan to have a repo up for this eventually where progress can be more easily tracked, but am currently in the middle of a mental breakdown so I'm not sure when I'll be able to get to it. (；´∀｀) Progress has been slow without motivation, I'm sorry for that and hope it turns around.

Anyway, here's the other longwinded status update: https://pastebin.com/2W3RpViA Contains within it 2 demos of the current implementation which still need work before they're actually useful. https://www.youtube.com/watch?v=NeuWm8fJoGc https://www.youtube.com/watch?v=n5iU4R4HxXU The standalone code for them (temporarily) exists here: https://github.com/djdv/go-ipfs/tree/fs-manager (the interesting thing is under cmd/fs) but isn't really useful for practical use yet. I don't know if I still have the old implementations from above but they weren't exactly stable either.

If I make progress on this, I'll try to ping you back here if you'd like. Spanning multiple years for this feels unacceptable... I'll try not to add another one on it.

@renne

I always prefer OS-independent implementations. A drawback I see with cgo-fuse is the lack of go-toolchains for embedded devices (e.g. OpenWRT).

In addition to Fuse this implementation aims to be agnostic in a lot of ways. It's not tied to go-ipfs specifically, nor is it tied to cgofuse either. The existing implementation allows you to mount an ipfs node (remotley) via bazil Fuse, cgofuse, and expose 9P listening servers. Other file system libraries could be swaped in for this if they're needed. However, I think in this particular case it'd probably make more sense to patch cgofuse to support this. But if it's easier, and something exists, we could just use it and expose it as an option in addition to the rest.

bam80 commented 3 years ago

@djdv Thank you. I wish you soonest recovering and hope you'll find your way to return to the project. I'm sure it's highly anticipated peace of technology for many. So good luck with your endeavor and hope we all get more updates from you soon :)

bam80 commented 3 years ago

that worked well enough to demo https://www.youtube.com/watch?v=OX0vM0Ay9Z0

I don't quite understand - we have IPNS already writable, isn't it? (sorry I may be missing things since I am new to all this)

djdv commented 3 years ago

It does in fact work, but it doesn't work reliably yet. Good enough to demo, but not good enough to use. Same is true for MFS+the FilesAPI. You can mount them, and reads and writes will work most of the time, but not always. (specifically lots of concurrent requests at once are not being handled properly somewhere)

Deeper technical portion (ignorable):

Those implementations were just experiments though. The interfaces responsible for this are going to be reworked to use the new Go fs interface (which didn't exist when this was authored, but is very similar), and will be reviewed and better tested in the process.

There's also issues with coordination with things like ipfs name publish, and ipfs files * which are going to have to be worked out. I wrote a bit about this here but that's out of date now. Speaking with people from the project, we highlighted some issues around how easy it would be to misuse a (remote) locking API.

There are other ideas around this but it's not been fully thought out yet as some of the other things need to be taken care of first. But the current idea in mind is to have the file system service expose some interface that lets you query its file table. This way the IPFS node can simply ask it if the key in question is in use. If not the publish can proceed, but otherwise it will error out saying the key is in use by the mount service.

For MFS this doesn't matter at all, but for the FileAPI (the node's own/specific MFS root) we could do either the same thing, or sidestep this entirely by relaying all the operations/data to the node itself. As if you were just doing ipfs files * locally on that node. We're going to first have to see where the chance for conflicts are, and then figure out how best to deal with them.

IIRC the current implementations are just "last writer wins", so whoever saves last will be reflected on the node's key or Files root. But I haven't looked at it in a while.

bam80 commented 3 years ago

I mean, plain go-ipfs seems has IPNS mount writable. Not sure how useful it is, and how it compares with your work. Are we talking about the same thing?

djdv commented 3 years ago

Short answer is that these are 2 different implementations which use 2 different underlying libraries.

Long answer: The existing go-ipfs implementation uses a Fuse library (from Bazil) which supports Linux and FreeBSD. I haven't checked on this but I heard that they recently dropped support for macOS, or at least it was causing issues when building go-ipfs on macOS.

One of the libraries I'm targeting is cgo-fuse, which supports more platforms, one of which includes Windows. And could potentially be extended in the future to support even more platforms.

The Bazil Fuse implementations of IPFS and IPNS are very old, and while I don't have any metrics for this, many users have been complaining that they're not practical for use. Using a lot of system resources even when idle, and even more when in use. Grinding the system to a halt unless they're artificially constrained by the OS (which only sidesteps the underlying problem). Due to the fact that these implementations are bound to fewer platforms already, I never intended on fixing them up. Rather it made more sense to just re-implement them for a new library, which supported the same platforms and more, and likely should be more performant.

So the existing implementations are likely more correct, but not performant enough for actual use as far as I can tell. The new implementations have given me no problems so far in terms of resource usage or perceived latency. But also, isn't fully correct yet.

Rambling on more differences:

Furthermore, there are other issues around the old implementations, such as the fact that it uses more layers of indirection than is necessary. The IPNS implementation utilizes the MFS library, and indirectly (sym)links into the IPFS mountpoint rather than just doing IPNS operations directly. I'm sure this contributes a lot of overheard, specifically all the locking that takes place within MFS, and jumping between process and kernel space via the host FS. I haven't looked at the implementation too deeply because it's likely going to be only for legacy or deleted outright. Also, some of that might be incorrect since I just kind of skimmed the code enough to port it to my interfaces and am trying to recall it now.

The cgo-fuse implementations I have stands alone, and just uses IPNS and IPFS methods of the CoreAPI directly, rather than linking into an external mountpoint and indirecting through the filesystem multiple times. To put that another way, the existing implementation requires IPFS be mounted for IPNS to function fully, where the new IPNS can be mounted independently. Only using IPFS internally.

Most important but hardest to express is how these implementations are managed and used at an API level by the host program. The existing implementation's interfaces are somewhat rigid and have heavy connections to the IPFS node with some preconceived expectations. The implementation I have exposes a bunch of constructors and management interfaces which abstract a lot of the details away, and allow for some greater flexibility at multiple layers. This helps separate platform differences rather than just expecting that the host is a POSIX-like system. And also should make it easier to implement a new file system (like "PinFS") or host file system API (like Fuse, 9P, NFS) that uses the existing file system implementations without needing any code changes.

I highlighted some of these things in a video , but it's not super interesting. It's a lot of little things like allowing you to mount IPFS and/or IPNS rather than forcing both, allowing IPFS in offline mode, multiple mount instances rather than just 1, and lot of other little things around sending errors back to the daemon and requester, formatting, and other things I don't remember lol

bam80 commented 3 years ago

Thank you for the introspection. I'm just trying to understand, given the fact legacy implementation has IPNS mount writable, does it mean it actually supports writable MFS mount? Documentation says nothing about it (or I read it wrong).

RubenKelevra commented 3 years ago

How about this - create a small service that converts NFS server commands to an MFS path by sending the requests to the go-ipfs daemon?

https://github.com/ipfs/roadmap/issues/83

NFS is basically supported on all platforms, even as boot mediums. :)

djdv commented 3 years ago

(I hope nobody minds my gigantic walls of text, let me know if I'm being too too verbose)

@bam80

I'm just trying to understand, given the fact legacy implementation has IPNS mount writable, does it mean it actually supports writable MFS mount?

I think it could probably be adapted from the legacy IPNS implementation but as far as I know there's no legacy implementation which mounts MFS. This seemed to be the intention of the FilesAPI, I'm not sure. The one I wrote here was done from scratch. I found an old demo of it which shows reading and writing from multiple separate mountpoints https://youtu.be/FXi3yYO2H1w?t=148 It had some issues then and probably still does today, much like the others. It needs to be reworked and formally tested as it's just a proof of concept really. But you can see I expose the node's root over a 9P socket on Windows, mount it on Linux, and copy a directory into it, which gets reflected on the host node.

@RubenKelevra I like that idea, but we should be able to use the existing IPFS Files API over HTTP without NFS, and in fact go the other direction.

To give an overview of how the system I have is intended to work, we have this file system interface: https://github.com/djdv/go-ipfs/blob/49f4cc7a4a51bf1f9fb0294a091d6e157dee83ec/filesystem/system.go#L7

And that interface is used to implement host file system API interfaces like Fuse and 9P (old branch - has a wrapper for Fuse and 9P but bazil fuse was not ported yet) https://github.com/djdv/go-ipfs/tree/ac90d4417a32daf3ec36a7b1692deda1dcc59c5f/core/commands/filesystem/manager/host (new branch - has a wrapper for bazil and Fuse. 9P still works but I didn't re-add it in the commits yet) https://github.com/djdv/go-ipfs/tree/49f4cc7a4a51bf1f9fb0294a091d6e157dee83ec/core/commands/filesystem

So the idea would be that we rework the MFS logic (to now work remotely with the FilesAPI on the remote IPFS node), which implements the first interface, and could then implement an NFS wrapper with the rest of them, giving us the ability to expose MFS as well as the other existing file systems abstractions I have here (IPFS, IPNS, PinFS, KeyFS, et al.): https://github.com/djdv/go-ipfs/tree/49f4cc7a4a51bf1f9fb0294a091d6e157dee83ec/filesystem/interface over an NFS listening server.

Not sure if that all makes sense, I need to write proper documentation when I'm not out of my mind. Because I'd really like other developers to be able to swap these components out and extend it without needing to make gigantic changes. I'm sure something is going to bite us here though with the abstractions and differences between APIs but we'll see how it goes. So far it's been cool to have the same implementations all piggybacking off each other without changing anything underneath, and I think we can just do the same for NFS and others host APIs / network services.

To rephrase that, I wrote the Fuse version and then to add 9P support I literally just had to write this: https://github.com/djdv/go-ipfs/tree/old-tmp/core/commands/filesystem/manager/host/9p and nothing else (outside of wiring it up to the CLI). And things like PinFS just implement a root directory and relay any subrequests to an embedded instance of IPFS so the code is very small, but super effective.

If anyone wants clarification on anything just let me know.

ec1oud commented 3 years ago

I'm attempting to try out your fsmanager fork branch, at first got a bit confused about the command-line options on ipfs itself: ipfs daemon --mount says 'Error: no arguments provided - portable defaults not implemented yet', and if I run the daemon first and then mount later, kept getting "Error: expected environment of type fscmds.filesystemEnvironment but received commands.Context". But then I watched the videos, and managed to build fs (but didn't find fsd), so I can see it's beginning to work: I can mount pinfs, see pins in it, cat files and ls directories. But ipns and ipfs are still empty. I have 3 ipns keys so far, why don't they show up there?

Anyway it sounds like it will be a good way forward. Now I have to find out more about 9p: I never used plan9 and didn't realize 9p is so alive with so many ports and rewrites: http://9p.cat-v.org/implementations Eventually I'd like to see IPFS working on NAS devices like the gnubee, but so far it takes too much memory (I can run ipfs on mine, but adding files makes it run out of memory); so in case we never get there, at least maybe 9p would provide a way to mount it remotely.

Is this work something that Protocol Labs is supporting, so that we can expect it to be merged eventually? It sounds promising.

djdv commented 3 years ago

@ec1oud Hey, thanks for trying it out.

Error: expected environment of type fscmds.filesystemEnvironment but received commands.Context

That's my mistake. Previously, the mount command was part of go-ipfs, but has since been migrated to a separate, standalone binary. Currently called just fs in the directory cmd\fs next to cmd\ipfs of that branch.

I forgot to remove it, but just pushed a commit that does so, along with a backport for a path issue on non-Windows platforms. To try out the experimental fs binary you can do this:

> go build ./cmd/fs
> ipfs daemon & #(any IPFS instance should work with `fs`)
> ./fs mount fuse pinfs /mnt/ipfs
> ls /mnt/ipfs

I have 3 ipns keys so far, why don't they show up there?

I'll be putting a repository up soon-ish which will contain refined versions of the code in that branch. At the moment, I have a "file system service daemon" written and mostly tested. This was needed to become a standalone process, and enables a few extra things as well. One of the highlights being integration with operating system service managers like Systemd, Windows svc, etc. Which allows the process to be managed by the OS, and enable things like automatically mounting on boot if desired.

But with that done, I want to start stabilizing the existing code from that branch. Getting to a stable set of read-only systems, with the write portions being reworked after. ~Hopefully in the process it will fix whatever issue is happening with IPNS you're having there.~ Edit: Actually, I think this is the result of mounting "IPNS" instead of "KeyFS". The tl;dr is to use fs mount fuse keyfs ... instead of fs mount fuse ipns .... For legacy reasons, IPFS and IPNS mount points act like they do in go-ipfs, where they have no content in their root, only child objects. "PinFS", and "KeyFS" are the equivalents that have a populated root directory, and relay child requests to the relevant subsystem.

but a bit confused about the command-line options now

Understandable. They've had to change a few times so the documentation trail has been bad.

At the moment, command line arguments to mount are a list of multiaddrs. fs mount --help shows an example of what they look like. /fuse/ipfs/path/ipfs, /fuse/ipns/path/ipns, ... As is, the format is /$host-API/$remote-API/$target, where target itself is an arbitrary multiaddr that's valid for the host-API. (For Fuse, this is typically a host file system path, but for something like 9P a TCP multiaddr is also a valid target) All 3 components become bound together by mount.

While all that info is necessary, those values are really verbose. So I made some text macro subcommands that fill a lot of this in for you.

ipfs mount fuse ... and the other subcommands, will prefixes the arguments passed into them, before calling mount with them. (e.g. fs mount fuse /ipfs/path/1 /ipns/path/2 becomes: fs mount /fuse/ipfs/path/1 /fuse/ipns/path/2, fs mount fuse ipfs /mnt/1 /mnt/2 becomes: fs mount /fuse/ipfs/path/mnt/1 /fuse/ipfs/path/mnt/2, etc.)

Still interested in better alternatives to this, but for now it seems to work. The current format is a change done in response to a tester's request to have shorter CLI invocations. At the time the syntax looked like this: ipfs mount --system=fuse --subsystem="IPFS,IPNS,File" --target="/ipfs,/ipns,/file"

In the future, I'd like to see the invocation mostly be bare. fs mount should check for a config file like Unix mount does for fstab, and/or provide some defaults. That would also allow something like fs mount -fstab=/somefile to just store a list of multiaddrs / set of mount arguments.

Now I have to find out more about 9p ...

Yeah it's really cool, and provides a lot via a simple interface. Has been interesting to read about, along with the rest of the Plan 9 papers.

Is this work something that Protocol Labs is supporting, so that we can expect it to be merged eventually? It sounds promising.

No affiliation. As it was told to me, the feature is something the project would like to have, but my work in particular is not something that adds any value. So it's something the IPFS project would benefit from, but only if done in a way more appropriately than I can provide. As far as I know this work isn't planned to be used in any of the official IPFS projects. It's just a temporary solution until someone else can implement it properly.

ec1oud commented 3 years ago

use fs mount fuse keyfs ... instead of fs mount fuse ipns ....

Aha, makes sense. Unfortunately that didn't succeed:

$ ./fs mount fuse keyfs /mnt/keyfs
Attempting to bind to host system: /fuse/keyfs, /fuse//mnt/keyfs...
Error: mount encountered an error: unexpected EOF

But pinfs is OK.

For legacy reasons, IPFS and IPNS mount points act like they do in go-ipfs, where they have no content in their root, only child objects.

Well mine has entries like this, when I run go-ipfs 0.8.0 with the --mount option:

dr-xr-xr-x 1 rutledge rutledge 0 2021-05-16 21:52 12D3KooWSk7ZTAfzFj3ikQJgkDaf2vuZt4Hnr9kFEz1sBESt4qLK
lr-xr-xr-x 1 root     root     0 2021-05-16 21:52 local -> 12D3KooWSk7ZTAfzFj3ikQJgkDaf2vuZt4Hnr9kFEz1sBESt4qLK

and I don't understand why there's only "local" when that is not the name of my "self" key. Nor do I understand why on the ipns fuse filesystem it's in base58btc whereas ipfs key list -l gives it in base36. I verified that it's the same cid anyway. But the other two ipns keys from the list are missing. Anyway ls works in that one directory, and the files are persistent. They are even writable! The changes that are written (cp -r /path/to/files /mnt/ipns/local/) can be seen there for a short time! But ipfs name resolve gives the same result before and after. And then the writes go missing: after a short time, it reverts to the last version again.

It would be very nice if writing to the ipns key directory would actually remap the new hash to that ipns key. Then we'd finally have a real filesystem, to write files locally and share them at the same time. (And maybe version control could be done by policy: the config file could say how many old versions to keep, or for how long, so those previous hashes just stay pinned for a while longer. But that's gravy.)

the feature is something the project would like to have, but my work in particular is not something that adds any value

Bummer. I'm really tired of waiting for ipfs to actually be a filesystem. But if your fork ends up working that well, maybe I can just use it for the forseeable future. And 9p might be a way to map extra storage from a NAS into ipfs... that's what I was thinking.

Theoretically maybe I could help you with this code too. But I have barely written any go at all (I spend most of my time in C++ unfortunately, not that I particularly like it), and don't know my way around this codebase either.

Anyway I look forward to your next push. Thanks for taking this on.

djdv commented 3 years ago

FWIW progress was made on refactoring the prototype into something more stable. I put a repo up for the standalone fs binary here: https://github.com/djdv/go-filesystem-utils It's still in progress, but should allow for tracking and discussing that progress easier than through a single thread. There's also a detailed reply in the link below.

@ec1oud GitHub added a feature that lets me post even bigger walls of text. So I put my reply there. ;^) https://github.com/djdv/go-filesystem-utils/discussions/5 I might need to break it into subtopics later 😪 it's super verbose lol