ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.83k stars 2.96k forks source link

[BOUNTY] Implement UnixFSv1.5 in go-ipfs #6920

Open Stebalien opened 4 years ago

Stebalien commented 4 years ago

This issue has a bounty!

Successfully closing this issue by producing a production-ready, mergeable PR can earn you not only the undying love of the IPFS community — it can net you a financial reward. See the current list of bounty issues and their values here.

The need in brief

Add UnisFSv1.5 metadata support per the spec https://github.com/ipfs/specs/blob/master/UNIXFS.md#data-format, which includes: specifying metadata, respecting existing metadata, and displaying metadata.

This aims to support large data import use cases (ex package managers), who need to retain file metadata, particularly last modified time (mtime), in order to selectively sync only data that has changed. Up until now if you wanted to host a large data set on IPFS, like a package manager's repository, it would be difficult to update - but with file mtime you can only reimport the changed files, making things much more efficient.

Deliverable

Requirements

Guidelines

momack2 commented 4 years ago

For context, here is the js-ipfs unixfsv1.5 implementation done by @achingbrain: https://github.com/ipfs/js-ipfs/pull/2621, and the blog post describing the shipped work: https://blog.ipfs.io/2020-02-13-js-ipfs-0-41/#unixfs-v1-5

If you want to start working on this, ping @Stebalien for a quick walkthrough of where to start and what already exists to build on top!

Stebalien commented 4 years ago

Implementation Notes

Implementing this feature will touch several sub-components of go-ipfs:

The new go-ipfs-files Node interface should be:

// Node is a common interface for files, directories and other special files
type Node interface {
    io.Closer

    // Mode returns the file's Mode
    Mode() os.FileMode

    // ModTime returns the files last modification time.
    //
    // If the last modification time is unknown/unspecified, ok is false and
    // the mtime is the Unix epoch (default timestamp).
    ModTime() (mtime time.Time, ok bool)

    // Size returns size of this file (if this file is a directory, total size of
    // all files stored in the tree should be returned). Some implementations may
    // choose not to implement this
    Size() (int64, error)
}
kalmi commented 4 years ago

@Stebalien Hi, @momack2 said to ping you for a walkthrough if one would like to start working on this, but I think it is safe to assume that your last comment is that walkthrough, and I will just go ahead. Thanks!

achingbrain commented 4 years ago

ipfs add also needs the following new options:

And there are two new mfs commands:

momack2 commented 4 years ago

Hey @kalmi! How are things going? Did you get your questions answered / are you unblocked?

kalmi commented 4 years ago

Hi!

Thanks. I'm not blocked, I had some personal issues last week, sorry for the silence. I plan to spend most of the upcoming long weekend on this. So far I have got ipfs get to use perm modes passed from the tarwriter, and I might have overdone the handling of the edge cases that can happen during ipfs get, but probably not. I want it to update modes for files and directories that already exist without doubling the number of syscalls, and that proved to be a bit tricky.

Status (WIP, "done" is very loosely defined at this point, this list is bound to be incomplete):

Stebalien commented 4 years ago

I'd drop the last two for now. Everything else looks good to me.

Note: If you're trying to find where different types/functions are being used, I recommend grep.app. E.g., .NewMultiFileReader.

momack2 commented 4 years ago

Hey @kalmi! Any status update? Are you blocked?

kalmi commented 4 years ago

@momack2 I am making progress, albeit unfortunately it is slower than I originally expected due to external factors. Now I am at the point where I believe the get side of things to be mostly done, and I am moving on to extending add, and I hope to be able to write back the added files with metadata somewhen in the next few days. I wanted to test aginst jsipfs before this as a sanity check, but I had some bad experiences it with it, and eventually decided to push it to the end. In the meantime, I will use the hashes from its test cases for the go-ipfs tests.

I am making sure that all changes to the libraries are backward compatible. The Node interface change specified by Stebalien is not, but I guess that's reasonable and acceptable. When I have everything working, I will start making PRs in the proper merge order.

Do you have a reasonable timeframe in mind for this that I should be aware of?

momack2 commented 4 years ago

Hey @kalmi! It'd be great to include this feature in the next release, go-ipfs 0.6. I think the feature cut for that release is ~2 weeks from Wednesday. Do you think that's a reasonable goal?

Note, the sooner to put up draft PRs for review (and the more modular/well tested they are) the easier it will be to land this quickly. =]

achingbrain commented 4 years ago

I wanted to test aginst jsipfs before this as a sanity check, but I had some bad experiences it with it

Could you open an issue on ipfs/js-ipfs please and we'll see if we can get to the bottom of the problem you had?

Stebalien commented 3 years ago

@kalmi Any status updates?

justicenode commented 3 years ago

@kalmi @Stebalien If no one is going to take care of this issue I'm willing to give some of my free time towards it. I'm programming in my free time and one of my programs kinda depend on this . So I'm willing to take care of this so my software works on any ipfs distribution. IDC about the money i just want to get this working. let me know if i can help or anything

autonome commented 3 years ago

Thanks @justicenode, yes it'd be great if you were able to carry this forward.

With no updates in over 60 days, it's ok to unlock this for other contributors.

@kalmi we really appreciate your effort to take this on, and happy if you're able to jump back in and collaborate!

justicenode commented 3 years ago

@autonome okay, I'm on holidays for like the next two weeks but after that i should have enough time to truly work on this.

Is there already something I can continue working on or do i have to start over? I tried looking for stuff from @kalmi, but i couldn't find anything, but maybe I'm just blind. If there's nothing to continue from, could you give me some pointers as to where i should start? Like what repo i should fork, or which files might be relevant.

Also I haven't done much with go so progress might be a bit slow.

djdv commented 3 years ago

@justicenode

Is there already something I can continue working on or do i have to start over?

I did a portion of this initially, maybe useful to see it. https://github.com/ipfs/go-unixfs/compare/master...djdv:ufs-1.5

I may have deleted the go-ipfs-files stash from my workspace though...


Like what repo i should fork, or which files might be relevant.

Make sure to see this for a todo list and file references: https://github.com/ipfs/go-ipfs/issues/6920#issuecomment-601001679

kstuart commented 3 years ago

@Stebalien, it doesn't look like this is actively progressing so mostly out of curiosity thought I'd take a look over the weekend and ended up implementing one of the cases, so at this time I can successfully preserve or set custom values for mtime/mode using ipfs add and retrieve correctly with ipfs get for a unix file. I've not attempted implementing other cases as the exercise was just to familiarize myself a bit with the codebase and Go language.

I'm not looking to tread on anyone's toes if this is already being worked on, otherwise I'm happy to take this forward, let me know.

momack2 commented 3 years ago

That’s great news! I don’t think @justicenode got a chance to pick this up, so no toes stepped on. A PR for the cases you’ve got working would be lovely (more small PRs are always easier to review). @aschmahmann to help with that 🤗

aschmahmann commented 3 years ago

Thanks @kstuart the implementation notes in this issue are hopefully clear enough to guide you in your implementation.

I'd recommend putting up work in progress PRs early and poking folks (e.g. me) for feedback even at the earlier stages instead of waiting to put up a mostly completed PR. This should make things easier to review and make sure that we're on the same page.

I'd also recommend breaking the different steps/sections up into different commits and/or PRs so that they are easier to review + merge.

kstuart commented 3 years ago

Draft PRs submitted, as I wrote this on the hoof I'll not be adding any further functionality for now but focusing on exploratory testing and authoring tests, probably not till the weekend though.

For anyone wanting to try out the new features you'll need all the related PRs, you should then be able to use the functionality detailed in #7754, testing on Windows would be welcome.

For webfile, as I'm not aware of an existing X- header for communicating file mode I'm initially proposing X-FileMode, this is easily changed.

achingbrain commented 3 years ago

I'm initially proposing X-FileMode

We had to change the js implementation around passing file mode/mtime to the HTTP API - because in the browser you have to use FormData objects to construct requests from multiple Blobs in order to prevent buffering file data into memory. Each Blob you add becomes a multipart part - but you cannot specify per-part headers with the FormData API.

So what we ended up doing was encoding the mode/mtime as a querystring and appending it to the field name because it was essentially the only field we could modify that didn't take user input (like the filename for example), so your request ends up looking something like:

POST /api/v0/add HTTP/1.1 
Host: localhost
Content-Type: multipart/form-data;boundary="boundary" 

--boundary 
Content-Disposition: form-data; name="file-0"; filename="/file-with-no-metadata.txt" 

...file data 
--boundary 
Content-Disposition: form-data; name="file-1?mode=0755"; filename="/file-with-mode.txt" 

...file data
--boundary
Content-Disposition: form-data; name="file-2?mode=0755&mtime=500"; filename="/file-with-mode-and-mtime.txt" 

...file data
--boundary
Content-Disposition: form-data; name="file-3?mode=0755&mtime=500&mtime-nsecs=4982"; filename="/file-with-all-the-metadata.txt" 

...file data
--boundary--

If go-IPFS could implement the same thing it would mean this feature will work across both implementations and in browsers.

kstuart commented 3 years ago

@achingbrain, for multipart I've implemented the querystring on field name, the name itself doesn't include an index as the way parts are constructed doesn't lend itself easily to doing so, in this context though I wouldn't expect that to be an issue, e.g. Content-Disposition: form-data; name="file?mode=0644&mtime=1604320500&mtime-nsecs=55555"; filename="beep.txt"

For the webfile case e.g. ipfs add http://example.com/somefile.txt the current implementation uses the following response headers when present:

Does the js implementation have a position on this?

achingbrain commented 3 years ago

No, looking for those headers in the response sounds perfectly reasonable.

The X- prefix has been deprecated for a few years but my opinion isn't that strong.

File-Mode might be more conventional than FileMode.

lidel commented 3 years ago

Given that we already return X-Ipfs-Path I'd go with Ipfs-File-Mode (assuming we want to drop X- at some point anyway) just for the sake of consistency and emphasis this metadata comes from IPFS.

achingbrain commented 3 years ago

This isn't for what we return though, it's what we will look for in HTTP responses when we are importing from a URL - expecting third parties to prefix headers with IPFS- just in case someone tries to ipfs add the URL seems unreasonable.

achingbrain commented 3 years ago

@kstuart I've put a repo together that'll allow you to run the interface tests js-IPFS runs but with minimal setup: https://github.com/achingbrain/ipfs-interface-tests

There's a readme with instructions but in brief once you've got everything running you'll just need to remove the skips for all the tests that mention mode, mtime or metadata in test/interface.spec.js

Please let me know if you have problems running it or need any help.

kstuart commented 3 years ago

@achingbrain That's great, and very insightful.

Should I be concerned about the preloadNode issue?

2020-12-01_13-03-preloadNodeError

achingbrain commented 3 years ago

Should I be concerned about the preloadNode issue?

If you pull from the repo this should be fixed now, sorry about that

kstuart commented 3 years ago

@achingbrain looks like there are still some issues running the tests, specifically when running the browser tests process doesn't exist:

FAILED TESTS:
  interface-ipfs-core over ipfs-http-client tests against go-ipfs
    .add
      ✖ should add with mtime as hrtime
        Chrome Headless 91.0.4455.2 (Linux x86_64)
      TypeError: process.hrtime is not a function
          at Context.eval (interface-ipfs-core/src/add.js:269:29)

Could you please investigate, the package browser-hrtime might provide a solution.

zacharywhitley commented 2 years ago

Any idea what the current status of this is? I'd really like the feature and it seems soooooooo close. I'm not interested in the bounty but would really like to see it get over the finish line and if there's any way I could give it that final push I'd be happy to help out.

kstuart commented 2 years ago

@zacharywhitley so far this year it has been challenging finding the time to focus on this, I'll be looking to move the PRs out of draft within the next 6/8 weeks, if not sooner.

markg85 commented 2 years ago

@kstuart I'm really curious about your progress here!

I would very much like to use this additional stat information in places where i'm (going to) list content from folders stored on IPFS.

Just for my information. This additional stat info you're providing is also there when using the HTTP API, right? Not just internal to the unixfs layer? I'm asking because i intent to add IPFS browse support in KODI as a VFS plugin where it in the backend would use the HTTP API (it's near certain that i will be doing this). Likewise i want to add a KIO SLAVE (it's for KDE's file browsing capabilities) to support IPFS natively in KDE applications too, it too would fetch file data via the HTTP API. It's unlikely i'll be doing the KDE side of things anytime soon though, that really is just a wishlist item for me personally that keeps getting pushed back.

kstuart commented 2 years ago

@markg85 yes, if stored with mode and/or time that information is returned from the HTTP API. As per screenshot it returns the same fields as js-ipfs, except for WithLocality field.

I need to review Mtime value, but aside from that you can rely on those fields being present if they were stored.

ipfs-web-file-stat

zacharywhitley commented 2 years ago

Really looking forward to this. Quick question that I could probably answer by looking at the branch but you can make things easy on my if you don't mind my masking. Is this going to include an update to fuse to pass the info on to mounts? That's what I really want it for. I'm looking to replace SDKMan with IPFS so installing a new JDK is as simple as pinning the directory for that version and updating a couple of env vars.

markg85 commented 2 years ago

Nice! Can't wait for this to land and an IPFS version be released with this! I don't want to rush you... but which version to you "soft target" for including this? 0.11?

kstuart commented 2 years ago

@zacharywhitley it's not in scope for this ticket, don't know what the plan is there.

@markg85 all I can say is I'm currently working on it, barring any major issues it shouldn't be much longer.

zacharywhitley commented 2 years ago

Thanks. That's fine. I was planning on writing a separate fuse implementation that also supported MFS so as long as it's there.

djdv commented 2 years ago

@zacharywhitley For what it's worth, a while ago I had code for this within the go-ipfs repo. There's a rough demo of that version here: https://www.youtube.com/watch?v=FXi3yYO2H1w (this would later see some changes)

More recently, I've been asked to make a standalone version of this, and have been porting those features over to it. A rough demo of that is here: https://www.youtube.com/watch?v=q-hKANpM9Xg While not done yet, the port of the MFS code is planned and should probably happen soon-ish. Although I'll need to coordinate some means of actually syncing changes back to the go-ipfs node in cases where you want to target it's ipfs files root rather than just IPNS keys, or arbitrary CIDs. Basically something like ipns name publish for the files root, where I can just tell it to replace its root CID with another one.

If you need help with yours, feel free to ping me.

zacharywhitley commented 2 years ago

Oh wow. That's fantastic and thank you for the offer of help. I will definitely keep in touch. You're from Philly? I went to Drexel University and used to work up in King of Prussia.

I've always thought that IPFS mount was an unfortunate rough spot for IPFS that didn't get the love it deserved. It's called IPFS and you go to use it as a filesystem and it's a little disappointing. I guess this is more an MFS thing but I always thought it would be nice if it also supported snapshots. I'm thinking of it creating a hidden directory every time the root changes to move the old root to something like .snapshot-20210826-3489. It's immutable. I should be able to go to an old version.

djdv commented 2 years ago

You're from Philly?

I am, I'm at the borders of Tacony and Holmesburg. :^)

It's called IPFS and you go to use it as a filesystem and it's a little disappointing

Strongly agree. It was the only reason I downloaded it in the first place, only to be greeted with a "not supported yet" message. And from what users tell me, the only platform left that it does work on isn't practically usable. Someone posted a few days ago about a 75-80% CPU reduction using my initial unoptimized version, and a fraction of the time. Compared to the current release. https://github.com/ipfs/go-ipfs/issues/5003#issuecomment-900331407 Not too bad.

snapshots

Yeah this would be really cool. It's something I like a lot in systems like ZFS, and shouldn't be too difficult to come up with a scheme for in IPFS.

the port of the MFS code is planned and should probably happen soon-ish

For what it's worth I at least ported over the read-only portions of this for now. They're here. It seemed to work, but I haven't wired up a way to pass a CID to it yet via the CLI. I need to fetch it from node using a different API. Unfortunately it looks like no progress has been made on this https://github.com/ipfs/interface-go-ipfs-core/pull/19 So I'm going to have to come up with some roundabout solution for now. Likely I won't mount and sync the root, but instead some subdirectory like /mount, until we can do something in the IPFS APIs to be able to set the root, or manipulate it directly.

ec1oud commented 1 year ago

How's it going with this? Seems like little progress lately?

kstuart commented 1 year ago

Last update was late June which refactored/improved/fixed the initial implementation, added support for recursively adding directory trees with mode/modification time and restoring them to the file-system and archive files.

Left to do is supporting mfs chmod and touch commands on directories, I was not initially able to identify an implementation path in the time available for this so need to study the code-base some more, progress on this is dependent on my schedule which is quite tight.

I've avoided moving dependent PRs out of draft status until all are ready, however if the dependent PRs are merged in order they shouldn't break anything, so I'll probably look to enable that shortly to avoid potential future conflict.

if you're not aware progress and dependency tracking is in main PR #7754.

kstuart commented 1 year ago

This should now be feature complete and all related PRs have been promoted from Draft status.

zacharywhitley commented 1 year ago

oh hell ya!!! Thank you. I've really been looking forward to this. Being able to run executables out of a fuse mounted ipfs is going to be great.

ec1oud commented 1 year ago

Cool, sounds good.

Well I was hoping for arbitrary metadata too (extended attributes) but that's not in 1.5 apparently.