ipld / legacy-unixfs-v2

This repository contains deprecated / legacy Unixfs "V2" discussions.
21 stars 3 forks source link

UnixFS Reboot #28

Closed mikeal closed 1 year ago

mikeal commented 5 years ago

TLDR;

I’ve listed every feature I can find that has been considered for UnixFSv2 below. We discussed this in a short meeting (notes at the end of the document, recording posted soon) and the following action items surfaced:

UnixFS vNext Reboot

For some time we’ve been directing issues, feature requests, and the general future of UnixFS at “UnixFSv2.” Since the size and scope of this future version were never locked down this has delayed improvements to UnixFSv1 and has failed to tie UnixFSv2 to a clear deadline and set of functionality.

The goal of this document is to describe the various issues and features we’d like to see in UnixFS and link to the historical discussions about those features. We can then use this document to discuss and prioritize each feature and find the best path to development whether it be improvements to UnixFSv1, an incremental UnixFSv2 on dag-cbor, or a bigger future version built on features that are still being researched.

General Links

Development Targets

This section briefly describes the difficulties and limitations of different development strategies which should help inform how to best approach solving each issues.

Improvements to UnixFSv1

One problem with improving UnixFSv1 is that every generic improvement we make cannot be leveraged by other applications outside of IPFS. For instance, the work we’ve done for directory sharding lives in UnixFSv1 and can’t be used for other generic sharding problems. This means that solving fairly generic problems via UnixFSv1 is less valuable and eventually duplicated effort.

The other problem is dag-pb, best summarized by @stebalian. In short, it’s very rigid and adding fields and other features are more cumbersome than dag-cbor.

UnixFSv2 on dag-cbor soonish

This development route solves the dag-pb related issues and makes some of the generic improvements leveragable outside of IPFS.

However, there is one major problem remaining: upgradability. All new features and improvements must exist and be relatively consistent between two versions of IPFS manipulating the same data. There is no good way to ensure this without future IPLD features that are still in the research phase.

This route of development is most problematic when tackling the “Reproducible Hashes” issue.

It should also be noted that, given we know that there is future un-developed IPLD work that we want to leverage for UnixFS we have a high degree of certainty that if we were to release this version of UnixFSv2 that we would still at some point in the future have another major version migration as well.

The actual development time for this would not be very long. @mikeal has already written draft implementations of several iterations of the UnixFSv2 spec in JS. A much more important factor to consider is the upgrade cost to IPFS users.

UnixFSv2 on “IPLD Future”

Most of the big problems facing UnixFS are problems facing IPLD generally. These problems are all being actively worked on in the form of engineering and research and at some future date can be leveraged for an ideal, future-proof (upgradable), version of UnixFS. However, when this will be available can’t be predicted with a high level of certainty.

Issues

Standard File/Directory metadata

Links

Arbitrary file metadata

The ability for users to add their own optional metadata to files could be very useful. However, doing arbitrary anything in dag-pb is problematic.

Reproducible Hashing

Put simply, this is the ability for a given UnixFS implementation to look at an existing UnixFS encoded file and a file on a traditional file system and to reproduce the UnixFS encode identically.

This feature is relatively simple if there is no optionality and every version of IPFS is in perfect alignment. However, this is almost never the case.

IPFS has several options that can be used when encoding a file that alter the encode.

One path is to encode all options into the encoded version of the file. This would work as long as both versions of IPFS are in alignment, which means this can fail to produce identical hashes often in new upgrade scenarios. The only to way to completely guarantee reproducible hashing is to have a guarantee that the applications are also identical but this is very difficult without “IPLD Future.”

“Inline” files and directories

For small files and directories the benefits of de-duplication are often out-weighed by the cost of retrieving additional blocks.

There are also use cases, like websites, where it may be highly beneficial to inline certain data into the root block of the directory tree for faster early rendering.

Support for non-utf8 Filenames

Link

Seeking in large directories

It’s often necessary to paginate through large directories and the current implementations do not easily support this.

Question: Given that you can only paginate through a randomized ordering using the current sharding data structure, how useful would this be without ordered collections?

Symlinks

Link

Protobuf Performance

While I’ve heard people say on numerous occations that dag-pb performance is an issue (compared to dag-cbor) I can‘t find any good links or resources to what the real impact of this is.

Miscellaneous

Meeting Notes: August 8th 2019

anyone wanna talk about attribs?

https://gist.github.com/warpfork/3948bd951e93c0f0b4e355d78b736f83

rvagg commented 1 year ago

closing for archival