Open lidel opened 2 years ago
My take is: hard-error on directories, support only files and pipes. Just like /bin/cat
I put together a test repo using js-unixfs to show how concat could work under the hood with building up nodes from several sub nodes.
https://github.com/RangerMauve/js-ipfs-stitch-test/
Agreed that directories should be an error. I don't think we can cat
a UnixFS tree with directories in it, so concatenating a directory in there seems like a separate use case.
Another high-level API, which would be super useful, and essentially becomes easy to support, given the core ipfs files concat
functionality, is a way to start with a single file and a list of splitpoints/offsets that you'd want to split on.
It could be a subcommand: ipfs files concat add <local path> <split points>
, where split points just contains a JSON array, or offset per line, that would then read local path <local path>
and add regular those offsets, and then concat the whole thing. Eg. given a 35M file, and offsets [0, 10M, 25M]
, the command would add 0-10M of file, add 10-25M, and add 25M-35M of the file. Maybe could support other add options, like being able to choose trickle dag?
Maybe there's two subcommands:
ipfs files concat add <local path> <split points>
and
ipfs files concat merge [ <local paths> | <cids> ]
if the split files already exist as individual files or already added as CIDs.
This just adds a common first step that would often be needed before using ipfs files concat
@ikreymer too complex. You'd simply:
ipfs files concat yourfile:0:20 yourfile:21:40 yourfile:41:
@ikreymer too complex. You'd simply:
ipfs files concat yourfile:0:20 yourfile:21:40 yourfile:41:
yeah, I guess could live with that, was just thinking the separate split file makes for an easier user API, especially if to be supported in libraries as well as CLI, and maybe dealing with hundreds of split points..
I've implemented a small library in JS that includes concat
as well as some related utilities that are useful for the web archiving use case:
https://github.com/webrecorder/ipfs-composite-files
Wrote something in go: https://github.com/anjor/unixfs-cat/blob/main/unixfs_cat.go
Happy to work more on it if it's useful/along the lines of the thinking here.
We are missing a high level API for concatenating existing UnixFS files into bigger ones. Having it would allow for improved deduplication in scenarios when bigger archives in formats like WARC (https://webrecorder.net) consist in big part of smaller files that are already on IPFS, allowing for CID/DAG reuse.
Use cases
ipfs object patch append-data
Proposed design
Add
concat
command toipfs files
that accepts two or more UnixFS-compatible DAGs and returns a CID that is a logical concatenation of all DAGs.FAQ / Open questions
We need to agree how to handle edge cases, below are my initial ideas, feedback on ergonomics and potential implementation caveats is appreciated
What happens when passed DAGs are all files?
Should this support directories? It opens additional questions: