Closed mikeal closed 4 years ago
Could this also include support for opt-in setting of content type?
The spec @ 12a3d57 already has a field for this (but it does not seem to be wired to anything):
message Metadata {
optional string MimeType = 1;
}
This would enable people to solve false-positives in content-type sniffing before v2 lands (https://github.com/ipfs/unixfs-v2/issues/11)
Can someone more familiar with the original spec and implementations explain how the Metadata message is currently used? It seems obvious that we should leverage it and also start using the MimeType field but without knowing a bit more about the history and current usage I can’t tell if we’re likely to break anything.
I'm not sure what to make of the MimeType idea. Unixy filesystems don't have a concept of MimeType; that's much higher level.
It certainly seems like an sizable embiggening of scope from the bullet points at the top.
Can someone more familiar with the original spec and implementations explain how the Metadata message is currently used?
As far as I know it was never implemented in neither go-ipfs nor js-ipfs. The way it was supposed to work is somewhat described here. I would again strongly advise steering clear of this construct: wrapper-blocks carrying metadata are not... great.
@mib-kd743naq at now time metadata block can be included in directory block by using identity hash. And file block can be included in metadata block in same way.
I’d like to surface these tradeoffs so that the folks with use cases driving the need for it can comment appropriately.
Using the Metadata message will:
Adding the metadata to the file/dir object itself will:
I’d like people closer to the use cases to weigh in on which of these they find most compelling. @andrew @alanshaw @achingbrain
@mikeal you are missing option 3 though: "metadata is part of the 'directory' entry"
@mib-kd743naq I updated my comment to be “file/dir” in the case of directory metadata. If there is another option you’re suggesting we explore where the metadata for every file in a directory is added to the directory entry we’ll need to discuss that a bit more before I add it because that sounds quite problematic when we start dealing with sharded directories :(
We have a mime type field in file metadata in Peergos, so can relate our experiences. There are two things useful to be aware of. 1) a file can have multiple mime types depending on the context 2) some mime types can't be deduced until the entire file has been read
@mikeal words are hard... instead next week during chaos camp I will attempt to build a PoC similar to my last large scale stress test, but this time for various types of metadata embedded in backwards-compatible-ish variants of dag-pb.
Then a concrete discussion based on actual blocks can be had.
I’d like people closer to the use cases to weigh in on which of these they find most compelling
Expanding the fields in the UnixFS data type seems like the most sensible path as adding extra nodes for each and every file will become expensive for very large file systems (package manager datasets, for example).
Two files with different metadata will have different root nodes, but I think this is fine as the file data is still de-duped across the two and fundamentally the metadata has to be stored somewhere. If we can do that without causing another network/disk/blockstore trip then great.
@warpfork the mime-type is there because users sometimes want to explicitly specify the MIME type. Unfortunately, this can be very important when using ipfs with a gateway. The alternative of just encoding this in a separate file and having the gateway interpret it was also discussed.
Some history: the metadata block was supposed to be used as follows:
{
Data: {
Type: Metadata,
// stuff...
},
Links: [{Cid: ActualFile}]
}
Unfortunately, doing it this way would be a slightly breaking change (for users of this feature). Inlining metadata directly into files would not.
As @mib-kd743naq points out, we could also inline into directories. This also gives us fast LS (which is currently a bit annoying). We could even add file types to directories (the repeated information shouldn't be an issue).
The primary problem with this is that resolving to a CID and then copying wouldn't carry the metadata.
On the other hand, this isn't unreasonable. Names are already a part of the directory. Making metadata a part of the directory isn't all that odd. I'd expect most tools to reference files relative to directories anyways.
A hacky alternative is to:
This matches the original design without breaking anything.
+1 towards the idea that if MIME type is getting well-known support, it should be something we move towards the gateway knowing of it, rather than making it a feature of the filesystem. This would be a much closer set of relationships to how the rest of the world works already (e.g. doing sysadmin today with nginx or something, I would generally configures MIME types at the webserver area, and not in filesystem metadata) -- and thus seems much less likely to go awry.
Carefully avoiding baking in the idea of a single "mimetype string" field into our filesystem metadata also leaves much more room for issues to evolve around the things Ian mentioned:
- a file can have multiple mime types depending on the context
- some mime types can't be deduced until the entire file has been read
PR is up now at https://github.com/ipfs/specs/pull/220
Note that I used uint32 for all the time data. In unixfsv2 we’re considering properties for 64bit high precision times but since uint32 is what most people expect I figured that was appropriate when adding these to unixfsv1.
(I just commented this on the PR, but posting again here for discoverability for anyone who didn't follow the jump to the PR...)
I'd like to just mention a couple links to prior art that's not merely prior art, but also particularly easy to read and review for inspirations:
Both of these (as well as the specs of tar, which I'm assuming everyone's at least given a cursory glance at already) are highly worth a quick skim just to see what other people have covered when trying to map this terrain.
There are large (large) bodies of thought on this out there already, and while we may or may not choose to do some things differently, we should make sure we're doing that on purpose. We'll be doing ourselves a sizable disservice if we add new features that unintentionally strike too far outside the norm by sheer accident of not having checked where the norm is.
I think this can be closed now - we've added mtime
and mode
to UnixFSv1, additional fields and arbitrary metadata will probably wait for UnixFSv2.
Current UnixFSv1 importers do not encode most of the standard file metadata from most file systems.
This has been a particular challenge for package managers since they already rely on some of this metadata.
The goal of this issue is to surface all the necessary discussion points in order to drive a new PR against the unixfs spec.
Potential metadata
Additional considerations
For time stamps (mtime, ctime, atime) we need to decide if we’re going to use high precision times or not. Most systems expect a 32-bit integer (low precision) while other use cases may need a 64-bit integer (high precision).
Do we want to store additional metadata of the directory? How do we handle updating this when someone updates only a single file in the directory?
Where do we store this metadata?
In terms of the data format, should these properties be added to the
File
message or theData
message?History
The history of this feature as well as meeting notes where this feature was prioritized are available here.