ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.03k stars 3k forks source link

deprecate `ipfs tar add` / `ipfs tar cat` #7951

Closed willscott closed 2 years ago

willscott commented 3 years ago

Currently, go-ipfs exposes a pair of functions, tar add and tar cat, which seem not widely used, and import an interesting representation of tar files.

There are some issues with this implementation (e.g. it loads the entire tar archive into memory), and the representation doesn't play nice with unixfs.

We should provide tar import/export using the mfs package to represent a unixfs subtree, and deprecate the tarfmt ipld codec.

aschmahmann commented 3 years ago

This can be deprecated, but people may be using the codec so we probably need to maintain it so people don't lose access to their data. I assume that the ask here is around the go-ipld-prime migration and converting codecs to the new form.

Probably the best bet is to make a not particularly performant go-ipld-format-> go-ipld-prime wrapper, but I'm not sure what the workload for this is.

EDIT: I removed the incorrect references to codecs, the tar commands use DagPb it's basically just a custom type with an importer/exporter

willscott commented 3 years ago

we probably need to maintain it so people don't lose access to their data

The proposal i guess is that we have a version with import supported to unixfs. At that time we add a warning on use of this tar command that it is deprecated and that data in this format should be exported and then imported using the new one. Then in a subsequent version we should get rid of this command & code path.

RubenKelevra commented 3 years ago

Well, UnixFS 1.0 can't handle any attributes, like users or groups IIRC. Importing a tar into UnixFS would basically destroy this information.

To make this limitation clear a flag could be used like --no-metadata while an add without this flag will fail until UnixFS can handle all stored metadata.

willscott commented 3 years ago

@RubenKelevra This is a good point. Are you making use of this representation of tar in order to store metadata? The counterpoint might be that if you want metadata you should store the entire tar archive as a unixFS file.

RubenKelevra commented 3 years ago

@willscott my point wasn't that I'm using it that way. But when we deprecate the tar add/tar cat functionality and expect users to switch, we should make clear that there are limitations.

When I'm extracting a tar to a filesystem - and IPFS UnixFS is exactly this - the expectation is, that the user/group read/write/execute rights are set as well.

I'm running an ipfs cluster that stores compressed tars, but since the checksum must match I don't extract them and import them as tar. It's a bit sad since I think there would be quite a lot of redundancies between the files, but the compression of each file makes them inaccessible for IPFS.

lidel commented 3 years ago

Would ipfs dag export|import be a safer replacement for ipfs tar add|cat that works with all DAG types? (I feel in most cases people want to have a "single-file-copy-of-a-CID" and don't really care if its TAR or CAR)

If so, we should do something similar to #8098

aschmahmann commented 3 years ago

@lidel these are pretty different use cases. ipfs dag import/export is used for extracting any IPLD graph into a portable format.

IIUC ipfs tar is meant for converting tar files into a DAG but in a way that is friendlier to deduplication then what you would get by just using UnixFS.

(I feel in most cases people want to have a "single-file-copy-of-a-CID" and don't really care if its TAR or CAR)

It's surprising to me that someone would notice that the HTTP API output of ipfs get is a TAR file and then try and import it with the ipfs tar command. We don't even talk too much about how ipfs get exports a TAR because the frequently used utilities (e.g. the js http client, go http client and go-ipfs CLI) un-tar the object and just give you a directory with files.

We can add more guardrails and help text to clarify, but that command has been around a long time and today is the first I'm hearing of this confusion.


Separately, we may want to remove support for the ipfs tar commands since they're not commonly used and have associated maintenance and user-learning costs associated with them. However, unlike with #8098 ipfs tar does not have a 1:1 replacement available.

RubenKelevra commented 2 years ago

I don't think we want to remove the ipfs tar support.

Tar is still the goto tool to ship software. Ipfs uses it itself to ship its binaries and sourcecode.

We should be able to import a tar file, keep all attributes in an efficient manner and on an export recover them all (if the user rights are sufficient enough).

In the future this would allow us to extract .tar.xz/.tar.gz/.tar.bz2 etc. and import the tar file itself to make use of deduplication.

If ipfs implements compression itself, we can reach the same transfer speeds as a compressed tar file, while still be able to deduplicate the content.

This allows us to support also reproduceable builts with signatures of the actual binaries, while the tar part can handle the file and folder attributes and we can transparently apply compression on top of that.

kallisti5 commented 2 years ago

Oof. I didn't know about this feature. I'd use it extensively.

Is it documented?

RubenKelevra commented 2 years ago

@kallisti5 sure.

http://docs.ipfs.io/reference/cli/#ipfs-tar

aschmahmann commented 2 years ago

@RubenKelevra @kallisti5 the ipfs tar command is almost certainly not what you're looking for. It's creating a custom DAG-PB IPLD format that's just for tar files. It's not UnixFS so it won't render on gateways or work with ipfs get (which is why there's ipfs tar cat).

Probably what you'd rather have is something like ipfs add --chunker=tar which would create a UnixFS file where the chunking occurs on the boundaries of the internal objects of the tar file. Some inspiration could come from https://github.com/bmwiedemann/ipfs-iso-jigsaw. If this logic gets implemented it's usable with go-ipfs even before it's added to go-ipfs by piping the output into go-ipfs via a CAR file, e.g. ipfs-tar-chunker my.tar | ipfs dag import.

RubenKelevra commented 2 years ago

@RubenKelevra @kallisti5 the ipfs tar command is almost certainly not what you're looking for. It's creating a custom DAG-PB IPLD format that's just for tar files. It's not UnixFS so it won't render on gateways or work with ipfs get (which is why there's ipfs tar cat).

Probably what you'd rather have is something like ipfs add --chunker=tar which would create a UnixFS file where the chunking occurs on the boundaries of the internal objects of the tar file. Some inspiration could come from https://github.com/bmwiedemann/ipfs-iso-jigsaw. If this logic gets implemented it's usable with go-ipfs even before it's added to go-ipfs by piping the output into go-ipfs via a CAR file, e.g. ipfs-tar-chunker my.tar | ipfs dag import.

Well as far as I can see --chunker=tar is completely undocumented.

I don't want to rely on an undocumented function.

But on the other hand, why is not every tar file automatically split with with chunker? 🤔

aschmahmann commented 2 years ago

That chunker doesn't currently exist someone would have to build it, and again you can build something that does this even without added code to go-ipfs. My point is that you almost certainly want UnixFS chunking here rather than a custom IPLD format.

If it's all merged in you could figure out the UX for things like type detection and default format chunkers. However, that's super off topic for this issue which is "let's kill the unused and largely not useful ipfs tar command" not planning out alternative features which could have done it's proposed job better. If you want to talk about that I'd start a thread on discuss.ipfs.io or open a new feature request.

lidel commented 2 years ago

fysa I'm officially marking them as deprecated in https://github.com/ipfs/go-ipfs/pull/8849

michel47 commented 1 year ago

my use case for the ipfs tar comand is to backup files with same name which is not allows with plain "ipfs add

find /somedir -name 'README.md' | ipfs tar add -

what would be the new way for doing it once the ipfs-tar is depreciated ?

Jorropo commented 1 year ago
find /somedir -name 'README.md' | tar -c --no-recursion -T - | ipfs add -w --stdin-name readme.tar

ipfs tar used to have a custom tar specific encoding instead of being encoded as files.

ipfs adding tars is better because tars are just unixfs files, which means clients don't need to implement a whole new custom thing. This maybe is less efficient but I want to add a content aware chunker so they would be efficiently deduplicate (soe the tar mode of the chunker would deduplicate both the tar and the underlying file by carefully chunking at the tar content boundries so that the CID of the tar is linking to tar's metadata and the original file CIDs).