ipfs / js-ipfs-unixfs

JavaScript implementation of IPFS' unixfs (a Unix FileSystem representation on top of a MerkleDAG)
Other
87 stars 34 forks source link

go-ipfs does not store filesize on symlinks #195

Open Gozala opened 2 years ago

Gozala commented 2 years ago

Looks like go-ipfs omits filesize in unixfs protobuf when you add ipfs add mysymlink e.g. see QmPZ1CTc5fYErTH2XXDGrfsPsHicYXtkZeVojGycwAfm3v but UnixFS.prototype.marshal does which results in different hashe.

lidel commented 2 years ago

this looks like a bug, we prob. dont need to store size:

unless there is a rationale for keeping it, my vote is to fix js-ipfs to do what go-ipfs does omit filesize

lidel commented 2 years ago

@Gozala mind opening PR to fix this?

john-heinnickel commented 2 years ago

I've not been able to figure out how to communicate the presence of symbolic links to the js-ipfs-unixfs importer because there does not seem to be any examples of same in the README documentation. It sounds like there is an implementation to be found if I go looking around through the source tree, but that is time consuming an error prone on the downside.

I am having a little trouble predicting how symbolic links will be formed in a way that maintains reference semantics symmetry with a host system in light of changing content... In the native host system filesystem, a UnixFS view is patterned after, it is possible to change a file's contents without breaking links to that file, and it is possible to rename files such that symbolic links will break. Neither of these effects requires changing anything about a Symlink itself.

The options for collecting the bits that differentiate one symlink from another with regard to hash computation would seem to require either using the original source filesystem's name path, or a name path in terms of CID traversal taken from the UnixFS analogs of such nodes. Here we have some apparent problems with either scenario:

Is it possible to break symlinks to files in the root by renaming the linked files?

The alternative to storing symlinks with their "native" filesystem tokens would involve translating those tokens to CIDs. However, not every node in the linked path is necessarily imported, and as just discussed, moving/renaming/adding/removing children to a directory will change its CID, breaking links that those operations would not affect unless they involved the direct targets of such a link. Likewise, with respect to the target of a symbolic link, changing the content of a linked file would effectively modify its CID even if it was modified in place on the native host.

There seems to be an impedance mismatch with symlinks here. Links by reference in a source file system work precisely because filesystem names are a labeling technique that is orthogonal to file content, which is the antithesis of what IPLD's semantic model for naming is. Can these realistically co-exist?