ipfs-inactive / package-managers

[ARCHIVED] 📦 IPFS Package Managers Task Force
MIT License
99 stars 11 forks source link

Filesystem Package Managers and rsync #81

Open aschmahmann opened 4 years ago

aschmahmann commented 4 years ago

We seem to be running into issues where using rsync on top of IPFS to load filesystem Package Manager registries into IPFS. These issues largely stem from IPFS simply being a different application than an OS's file system. There are a number of potential avenues to explore here that can each solve this problem.

1) Just as we have ipget (https://github.com/ipfs/ipget) as an IPFS aware wget we could have ipsync 2) We could create/utilize a filesystem layer to emulate the filesystem so that native tools like rsync work better

Notably each of these approaches basically amount to writing an IPFS application that properly handles rsync, whether we do it explicitly (ipsync, or a shell script) or implicitly (FUSE). When deciding a path forward we'll need to take into account Performance, DX for package managers, and Reusability beyond package managers.

Note: See #21 for more info

andrew commented 4 years ago

Related to https://github.com/ipfs/package-managers/issues/74 and https://github.com/ipfs/package-managers/issues/71

meiqimichelle commented 4 years ago

Notes from sprint planning: @djdv to write a quick note here to relate this to some of his work, and @andrew may pull this in to a summary doc he's started in #78.

djdv commented 4 years ago

Notably each of these approaches basically amount to writing an IPFS application that properly handles rsync, whether we do it explicitly (ipsync, or a shell script) or implicitly (FUSE). When deciding a path forward we'll need to take into account Performance, DX for package managers, and Reusability beyond package managers.


The work being done around the overarching FS API effort (https://github.com/ipfs/package-managers/issues/71) should likely tie into this.

It seems likely that there will be a lot of overlap between a purpose made syncing tool, and a performant FS API. Ideally work done around FS APIs would allow for the core of these components to be shared. In the tool scenario you can imagine FS-like manipulations being done through the same API we'd use to implement mount

Semi-related: https://github.com/ipfs/package-managers/issues/74#issuecomment-514332578

djdv commented 4 years ago

During our meeting, we talked about the idea of using existing benchmarking tools, that are meant for traditional filesystems, and using them to target our mount implementation. As well as the idea of building harnesses that simply act as 9P clients that attach to our FS server (on the daemon).

This would also give us an idea of the overhead associated with using the FS protocol, when compared against benchmarks that are using the underlying APIs directly.

The nice thing about this, is that it should just come naturally as the implementation progresses. As it becomes more technically correct (spec compliant), we can simply target it with existing testing software. Previously, I've used (a fork of)FSX, as well as others, to help debug the past fuse implementations.

kevincox commented 4 years ago

I think that the performance problems are simple enough to find. For example just reading a locally cached file is incredibly slow and ipfs daemon burns 500% CPU.

% pv /ipfs/QmaUXCVgQgC9b6TPJ1ZAZqWXsoW7vaUwR7f3tadFBx839R >/dev/null
 175MiB 0:02:38 [1.63MiB/s] [=======>                          ] 26% ETA 0:07:24
% ipfs version
ipfs version 0.4.23
andrew commented 4 years ago

@kevincox I believe there have been a number of performance improvements made recently, although they have not been released yet as 0.5.0, might be worth testing against master of go-ipfs as well

kevincox commented 4 years ago

I tried again on latest git and while the results were better they weren't great. It was now using <400% CPU and transferring a bit fast. I think this is more than "performance improvement" range. Someone needs to take a look at the architecture and make fundamental changes.

% ipfs version
ipfs version 0.5.0-dev
╰% pv /ipfs/QmaUXCVgQgC9b6TPJ1ZAZqWXsoW7vaUwR7f3tadFBx839R >/dev/null
39.9MiB 0:00:24 [1.82MiB/s] [>                                 ]  5% ETA 0:06:19
aschmahmann commented 4 years ago

@kevincox do you mind giving a little more information on what you are doing?

kevincox commented 4 years ago
kevincox commented 4 years ago

One other note. I tried with default fixed-sized chunking and rabin. The speed was roughly the same in either case.

aschmahmann commented 4 years ago

@kevincox the FUSE implementation definitely needs some reworking. A contributor was working on an upgrade https://github.com/ipfs/package-managers/issues/74, but unfortunately got pulled away before it could be completed.

If you're interested in helping out take a look, I'm sure the help would be appreciated, I know the contributor is interested in continuing work on the implementation as well once he's available.

djdv commented 4 years ago

[This post is meant to give some insight as to what those branches are, and my intent on what to do with them.]

I would advise against focusing on the experimental mount branches as they are complete rewrites that were not well received. They covered more platforms and were more performant when functioning, but are a practical dead end due to their large complexity/breadth.

Focusing on improving the existing implementation seems more likely to have impact in the short term. (albeit at the cost of being platform bound.) However this is something I can't help with since the existing implementation was not referenced at all during the rewrites. (It was meant to be replaced, so there's little value in gleaning over something that's antithetical to both goals.)

At current, I am working on a branch which combines the efforts and orchestrates them via the mount command, using the appropriate file system provider/implementation for the platform. (i.e. using the 9P protocol on Linux, and cgo-fuse on everything else, with a common interface beneath them. Allowing for more platforms to be supported later as well, by implementing their preferred provider.) But there's no expectation of that work making it into mainline as it's completely different to the existing solution and would require a large amount of coordination on a low priority item. This was already tried by myself and I failed to meet the quality bar, twice. The effort being spent is strictly out of necessity as I and other users have wanted this feature for a long time and are willing to accept something that improves incrementally over nothing. As far as I know, nobody else is working on this. Feedback from users for both branches were very positive and people remarked on the perceptible performance improvements without even being asked. So I feel it's worth finishing, but again, I don't think it's possible to get it into a state where it would be deemed officially acceptable. (without leaning heavily on the engineers of the project to guide an amateur) Something new by someone more experienced is likely going to be the best solution, as I'm just a fallback solution.

As for my availability, I'm struggling with some personal difficulty that could go either way. I'd honestly rather work on this but I don't know when I'll be freed up, if at all. I'll try my best ┐('~`;)┌