Chunker-argument for 'files write'

I use buzhash as chunker on one of my projects. Since the chunker argument is currently not available on ipfs files write I need to add all files as pins first, copy them to the location in the MFS and unpin it afterwards, which is a rather crude workaround.

I like to have the --chunker flag on ipfs files write as well.

Just as a tip: unless you cause GC, you can add --pin=false to ipfs add, as long as you use e.g. ipfs files cp to ensure that it will be retained. Worst case you would have to run the ipfs add --pin=false again afterwards.

I second the request, however. This is just a (less-bad?) work-around.

@namibj but when I'm causing a GC I'm in a deadlock since my data which I've added just got deleted with no way of knowing that it happened.

@RubenKelevra It seems that ipfs files stat --with-local allows querying if the files survived a theoretical GC. I didn't try how reliable it is for detecting the case of a file being almost complete (say, missing just a single byte for a large file), but if it's reliable, you could just use that to check if you need to run ipfs add --pin=false again after ipfs files cp. Keep in mind that you'd only ever call ipfs files cp once, and that ipfs add --pin=false is idempotent. Calling it twice just uses more compute.

I don't see where there would be a deadlock. Can you elaborate?

@namibj wrote

I don't see where there would be a deadlock. Can you elaborate?

Well, when you run ipfs add --pin=false and the GC removes your file, an ipfs files cp /ipfs/$cid /path/to/file would try to find the cid without any timeout.

@RubenKelevra It seems that ipfs files stat --with-local allows querying if the files survived a theoretical GC. I didn't try how reliable it is for detecting the case of a file being almost complete (say, missing just a single byte for a large file), but if it's reliable, you could just use that to check if you need to run ipfs add --pin=false again after ipfs files cp. Keep in mind that you'd only ever call ipfs files cp once, and that ipfs add --pin=false is idempotent. Calling it twice just uses more compute.

While this might work, it's still not thread-safe, since after the ipfs files stat --with-local call the file could still disappear before my next command is processed.

I've worked around it in my code like this:

https://github.com/RubenKelevra/rsync2ipfs-cluster/blob/4baf9d7d4d77df1662d2710cae1aaff7241cc635/bin/rsync2cluster.sh#L262

ipfs files write seems to support "random" writes into a file, which might be a problem for bespoke chunking. IMO Adding a MFS-target alternative to the existing ipfs add command might be a better way to accomplish the goal, say, have it get an option that will take pairs of paths: one for the to-be-added-file, the other for the MFS target location. It could build the directory structure incrementally to keep the partially-added folder and contents save from the GC. Even supporting only one MFS folder (to put everything listed on the CLI into) should cover most of the currently-suffering usecases I can think of.

While looking for status updates on the IPFS Arch mirror, I came across this issue again and realized I missed something:

@namibj wrote

I don't see where there would be a deadlock. Can you elaborate?

Well, when you run ipfs add --pin=false and the GC removes your file, an ipfs files cp /ipfs/$cid /path/to/file would try to find the cid without any timeout.

@RubenKelevra It seems that ipfs files stat --with-local allows querying if the files survived a theoretical GC. I didn't try how reliable it is for detecting the case of a file being almost complete (say, missing just a single byte for a large file), but if it's reliable, you could just use that to check if you need to run ipfs add --pin=false again after ipfs files cp. Keep in mind that you'd only ever call ipfs files cp once, and that ipfs add --pin=false is idempotent. Calling it twice just uses more compute.

While this might work, it's still not thread-safe, since after the ipfs files stat --with-local call the file could still disappear before my next command is processed.

Actually, you'd do this:

cid=`ipfs add --pin=false --chunker=buzhash --quieter "$file"` &&
ipfs files cp "/ipfs/$cid" "/path/to/place/$file" &&
if ! $( ipfs --enc json files stat --with-local "/path/to/place/$file" | jq -e '.CumulativeSize == .SizeLocal' )
  ipfs add --pin=false --chunker=buzhash --quieter "$file"
fi

if "is referenced somewhere in the MFS" prevents GC, I don't see how this could race in this bad way.

I've worked around it in my code like this:

https://github.com/RubenKelevra/rsync2ipfs-cluster/blob/4baf9d7d4d77df1662d2710cae1aaff7241cc635/bin/rsync2cluster.sh#L262

I looked at that (well, the master branch), and I'd suggest to skip the precondition checking from the happy path, e.g. in https://github.com/RubenKelevra/rsync2ipfs-cluster/blob/7e2b846f6449ed8c568d0cd2e435d63c618099a4/bin/rsync2cluster.sh#L461-L468, try rm first, and only in case of failure do the verbose-reporting $(! stat $file ) && rm $file "already deleted"-tolerant fallback.

Further optimizations

Also, ipfs-cluster-ctl pin update is a thing: https://github.com/ipfs/ipfs-cluster/blob/master/cmd/ipfs-cluster-ctl/main.go#L736 and seems to be honored by the follower (at least there appears a codepath for this; haven't checked if it'll be hit when the controller order the follower to, though): https://github.com/ipfs/ipfs-cluster/blob/96db605c5027dbdbf69d93aa1b3a942d76c5811f/ipfsconn/ipfshttp/ipfshttp.go#L354. This IIUC makes delta-updates to the DAG scale closer to linear in the delta size, instead of in the full DAG size (citing from ipfs pin update --help):

Efficiently pins a new object based on differences from an existing one and, by default, removes the old pin.

This command is useful when the new pin contains many similarities or is a derivative of an existing one, particularly for large objects. This allows a more efficient DAG-traversal which fully skips already-pinned branches from the old object. As a requirement, the old object needs to be an existing recursive pin.

Do note the warning on ipfs-cluster-ctl pin update --help:

Unlike the "pin update" command in the ipfs daemon, this will not unpin the existing item from the cluster. Please run "pin rm" for that.

MFS-spamming considered harmful

As my computer (in a test of some 8 GiB of research data files with ipfs add --pin=false --recursive --only-hash --chunker=buzhash data/) appears to hash (recursively) at almost 450-500 MB/s, I'd suggest trying to just run this against rsync's target dir, putting the resulting root CID into the local MFS via ipfs files cp, and then running ipfs add again, but this time without --only-hash. If raw leaves aren't an issue for the application, I'd expect the filestore to do a better job, as it doesn't actually copy the data around (see https://github.com/ipfs-filestore/go-ipfs/blob/master/filestore/README.md#maintenance for how to make IPFS notice the absence of backing files for blocks it believed to possess) and the new blocks only really have to be initially seeded into the cluster.

If it takes a day to process the 100-ish GB of the Arch repo incrementally, these incremental tactics might not be the answer.

Hey @namibj,

thanks for taking the time to look into this. I try to give you the answers I can provide, but it feels a bit confusing to me. So let me try to make sense of it.

ipfs files write seems to support "random" writes into a file, which might be a problem for bespoke chunking.

I don't do random writes. And I don't want to. There's support for this, which means it can be handled by the chunking backend. But this doesn't matter as I don't use random writes.

IMO Adding a MFS-target alternative to the existing ipfs add command might be a better way to accomplish the goal, say, have it get an option that will take pairs of paths: one for the to-be-added-file, the other for the MFS target location.

Why is this better than extending a command which already does what I need by a parameter to be user-customizable than which is currently just fixed? Can you elaborate?

It could build the directory structure incrementally to keep the partially-added folder and contents save from the GC. Even supporting only one MFS folder (to put everything listed on the CLI into) should cover most of the currently-suffering usecases I can think of.

That's exactly what I do.

But this won't protect it from the GC. The GC is not threadsafe atm and will disrupt any operation by deleting data.

That's why I only run the GC after I completed an operation and don't do anything on the MFS.

While looking for status updates on the IPFS Arch mirror, I came across this issue again and realized I missed something:

@namibj wrote

I don't see where there would be a deadlock. Can you elaborate?

Well, when you run ipfs add --pin=false and the GC removes your file, an ipfs files cp /ipfs/$cid /path/to/file would try to find the cid without any timeout.

@RubenKelevra It seems that ipfs files stat --with-local allows querying if the files survived a theoretical GC. I didn't try how reliable it is for detecting the case of a file being almost complete (say, missing just a single byte for a large file), but if it's reliable, you could just use that to check if you need to run ipfs add --pin=false again after ipfs files cp. Keep in mind that you'd only ever call ipfs files cp once, and that ipfs add --pin=false is idempotent. Calling it twice just uses more compute.

While this might work, it's still not thread-safe, since after the ipfs files stat --with-local call the file could still disappear before my next command is processed.

Actually, you'd do this:
cid=`ipfs add --pin=false --chunker=buzhash --quieter "$file"` &&
ipfs files cp "/ipfs/$cid" "/path/to/place/$file" &&
if ! $( ipfs --enc json files stat --with-local "/path/to/place/$file" | jq -e '.CumulativeSize == .SizeLocal' )
  ipfs add --pin=false --chunker=buzhash --quieter "$file"
fi
if "is referenced somewhere in the MFS" prevents GC, I don't see how this could race in this bad way.

While I don't run the GC automatically I could do this. But as the GC is not threadsafe it wouldn't guarantee anything.

Also an ipfs files stat --with-local and an ipfs pin --pin=false as a response will never be atomic, so no. This is not a solution if the GC is running automatically.

In this specific case the ipfs add --pin=false might be deleted right after it has been added. Alternatively ipfs files cp could give up finding the CID before the file has been added completely.

I've worked around it in my code like this: https://github.com/RubenKelevra/rsync2ipfs-cluster/blob/4baf9d7d4d77df1662d2710cae1aaff7241cc635/bin/rsync2cluster.sh#L262

I looked at that (well, the master branch), and I'd suggest to skip the precondition checking from the happy path, e.g. in https://github.com/RubenKelevra/rsync2ipfs-cluster/blob/7e2b846f6449ed8c568d0cd2e435d63c618099a4/bin/rsync2cluster.sh#L461-L468, try rm first, and only in case of failure do the verbose-reporting $(! stat $file ) && rm $file "already deleted"-tolerant fallback.

How does this "optimization" help anything? Correct me if I'm wrong but that's just a variation and it would actually send 3 commands if a file is not there, instead of just one. So recovering from a partly written rsync log is at the end slower than before.

Further optimizations

Also, ipfs-cluster-ctl pin update is a thing: https://github.com/ipfs/ipfs-cluster/blob/master/cmd/ipfs-cluster-ctl/main.go#L736 and seems to be honored by the follower (at least there appears a codepath for this; haven't checked if it'll be hit when the controller order the follower to, though): https://github.com/ipfs/ipfs-cluster/blob/96db605c5027dbdbf69d93aa1b3a942d76c5811f/ipfsconn/ipfshttp/ipfshttp.go#L354. This IIUC makes delta-updates to the DAG scale closer to linear in the delta size, instead of in the full DAG size (citing from ipfs pin update --help):

Efficiently pins a new object based on differences from an existing one and, by default, removes the old pin. This command is useful when the new pin contains many similarities or is a derivative of an existing one, particularly for large objects. This allows a more efficient DAG-traversal which fully skips already-pinned branches from the old object. As a requirement, the old object needs to be an existing recursive pin.

Do note the warning on ipfs-cluster-ctl pin update --help:

Unlike the "pin update" command in the ipfs daemon, this will not unpin the existing item from the cluster. Please run "pin rm" for that.

I mean, I do use ipfs-cluster-ctl pin update. So I'm unsure what you want to recommend me to change here.

And yes I have read the warning before implementing it. That's why I remove the old cluster pin after updating the old one.

MFS-spamming considered harmful

As my computer (in a test of some 8 GiB of research data files with ipfs add --pin=false --recursive --only-hash --chunker=buzhash data/) appears to hash (recursively) at almost 450-500 MB/s, I'd suggest trying to just run this against rsync's target dir, putting the resulting root CID into the local MFS via ipfs files cp,

This doesn't make any sense:

First there's a good reason I don't run ipfs add over the rsync folder. It will take a lot of time to do this, rather than running just over the changed files. We talk about 1-2 orders of magnitude here.

Second only-hash would only create the meta data and discard the metadata and the data behind the hash afterward. You obviously can't add a CID to the MFS which doesn't exist in the block storage. It would be searched network-wide and the command will return eventually with an error.

Third, the paths are not correct if I do this. I do remap some paths and also do some filtering of files and folders which are not needed.

I don't see how this would improve anything.

... and then running ipfs add again, but this time without --only-hash. If raw leaves aren't an issue for the application, I'd expect the filestore to do a better job, as it doesn't actually copy the data around (see https://github.com/ipfs-filestore/go-ipfs/blob/master/filestore/README.md#maintenance for how to make IPFS notice the absence of backing files for blocks it believed to possess) and the new blocks only really have to be initially seeded into the cluster.

Filestore is an experimental function. I will not use experimental functions.

Apart from that, there are a lot of limitations on how filestore do gets handled. And I run into limitations without using it. Why should I switch to a more untested and marked experimental code path?

If it takes a day to process the 100-ish GB of the Arch repo incrementally, these incremental tactics might not be the answer.

I'm not sure why you want me to hash the whole directory if you think that doing it incrementally does take days. I don't think doing more work will improve anything.

It takes actually less than a minute, usually.

Hey @RubenKelevra ,

thanks for the effort you put into popularizing practically-useful IPFS applications like the Arch mirror that runs a localhost HTTP gateway and shares recently-fetched packages via P2P on internet and intranet, with no mirrorlist tricks needed to use an intranet cache if and when available.

Hey @namibj,

thanks for taking the time to look into this. I try to give you the answers I can provide, but it feels a bit confusing to me. So let me try to make sense of it.

Sorry about the partially-incoherent writing, I had never intended that much commentary, or, for the record, this (I'm responding out-of-order; this is the last bit I'm writing) much response commentary. It just didn't feel sufficiently done any sooner.

ipfs files write seems to support "random" writes into a file, which might be a problem for bespoke chunking.

I don't do random writes. And I don't want to. There's support for this, which means it can be handled by the chunking backend. But this doesn't matter as I don't use random writes.

The ipfs files write handler seems geared towards random writes, taking care that full-file sequential writes reach suitable performance. Small writes (sub-chunk-size and up to a few) can't properly drive the rolling-hash chunkers without fragments on the boundaries, or re-creating boundary chunks just outside of the modified region.

Also, it's a strictly streaming one, instead of letting the gateway read the files directly from the filesystem (at least that's how I understand ipfs add works, even without --nocopy).

IMO Adding a MFS-target alternative to the existing ipfs add command might be a better way to accomplish the goal, say, have it get an option that will take pairs of paths: one for the to-be-added-file, the other for the MFS target location.

Why is this better than extending a command which already does what I need by a parameter to be user-customizable than which is currently just fixed? Can you elaborate?

I assumed the GC was usable concurrently when I wrote that, and it'd have been to stick the CID(s) into the MFS within the same transaction so as to not expose them for a brief window as orphans for the GC to try and reap.

ipfs files write is more like a POSIX filesystem, while ipfs add is more like an object store that allows directories (by pushing the recursion burden to the client). Given that the former is implemented mostly on top of the latter (for IPFS), it'd seem more appropriate to me to use the object store commands and just enable fusion with an otherwise subsequent ipfs files cp /ipfs/$cid /target/path.

It could build the directory structure incrementally to keep the partially-added folder and contents save from the GC. Even supporting only one MFS folder (to put everything listed on the CLI into) should cover most of the currently-suffering usecases I can think of.

That's exactly what I do.

But this won't protect it from the GC. The GC is not threadsafe atm and will disrupt any operation by deleting data.

Oh, I didn't realize it was that bad. I expected them to have some form of concurrent GC by now that isn't so bad that one has to use it in stop-the-world regardless (and the concurrent-ness only guards against DB corruption).

That's why I only run the GC after I completed an operation and don't do anything on the MFS.
While looking for status updates on the IPFS Arch mirror, I came across this issue again and realized I missed something:

@namibj wrote

I don't see where there would be a deadlock. Can you elaborate?

Well, when you run ipfs add --pin=false and the GC removes your file, an ipfs files cp /ipfs/$cid /path/to/file would try to find the cid without any timeout.

@RubenKelevra It seems that ipfs files stat --with-local allows querying if the files survived a theoretical GC. I didn't try how reliable it is for detecting the case of a file being almost complete (say, missing just a single byte for a large file), but if it's reliable, you could just use that to check if you need to run ipfs add --pin=false again after ipfs files cp. Keep in mind that you'd only ever call ipfs files cp once, and that ipfs add --pin=false is idempotent. Calling it twice just uses more compute.

While this might work, it's still not thread-safe, since after the ipfs files stat --with-local call the file could still disappear before my next command is processed.

Actually, you'd do this:
cid=`ipfs add --pin=false --chunker=buzhash --quieter "$file"` &&
ipfs files cp "/ipfs/$cid" "/path/to/place/$file" &&
if ! $( ipfs --enc json files stat --with-local "/path/to/place/$file" | jq -e '.CumulativeSize == .SizeLocal' )
  ipfs add --pin=false --chunker=buzhash --quieter "$file"
fi
if "is referenced somewhere in the MFS" prevents GC, I don't see how this could race in this bad way.
While I don't run the GC automatically I could do this. But as the GC is not threadsafe it wouldn't guarantee anything.

Also an ipfs files stat --with-local and an ipfs pin --pin=false as a response will never be atomic, so no. This is not a solution if the GC is running automatically.

In this specific case the ipfs add --pin=false might be deleted right after it has been added. Alternatively ipfs files cp could give up finding the CID before the file has been added completely.

I've worked around it in my code like this: https://github.com/RubenKelevra/rsync2ipfs-cluster/blob/4baf9d7d4d77df1662d2710cae1aaff7241cc635/bin/rsync2cluster.sh#L262

I looked at that (well, the master branch), and I'd suggest to skip the precondition checking from the happy path, e.g. in https://github.com/RubenKelevra/rsync2ipfs-cluster/blob/7e2b846f6449ed8c568d0cd2e435d63c618099a4/bin/rsync2cluster.sh#L461-L468, try rm first, and only in case of failure do the verbose-reporting $(! stat $file ) && rm $file "already deleted"-tolerant fallback.

How does this "optimization" help anything? Correct me if I'm wrong but that's just a variation and it would actually send 3 commands if a file is not there, instead of just one. So recovering from a partly written rsync log is at the end slower than before.

I (seemingly misstakenly) assumed it'd usually succeed when trying to delete a file. It seems the error message for "file wasn't there to begin with" is easy to match on, probably even easy to accurately predict at least if filenames aren't exotic. Should probably be a little string construction and then comparison of the predicted "file not found" error message with the returned error message to confirm a failed delete command won't need to be repeated (this is more effort though, and I didn't expect it to be in the hot path).

Further optimizations

Also, ipfs-cluster-ctl pin update is a thing: https://github.com/ipfs/ipfs-cluster/blob/master/cmd/ipfs-cluster-ctl/main.go#L736 and seems to be honored by the follower (at least there appears a codepath for this; haven't checked if it'll be hit when the controller order the follower to, though): https://github.com/ipfs/ipfs-cluster/blob/96db605c5027dbdbf69d93aa1b3a942d76c5811f/ipfsconn/ipfshttp/ipfshttp.go#L354. This IIUC makes delta-updates to the DAG scale closer to linear in the delta size, instead of in the full DAG size (citing from ipfs pin update --help):

Efficiently pins a new object based on differences from an existing one and, by default, removes the old pin. This command is useful when the new pin contains many similarities or is a derivative of an existing one, particularly for large objects. This allows a more efficient DAG-traversal which fully skips already-pinned branches from the old object. As a requirement, the old object needs to be an existing recursive pin.

Do note the warning on ipfs-cluster-ctl pin update --help:

Unlike the "pin update" command in the ipfs daemon, this will not unpin the existing item from the cluster. Please run "pin rm" for that.

I mean, I do use ipfs-cluster-ctl pin update. So I'm unsure what you want to recommend me to change here.

I somehow read a version from mid-2020 or therabouts when I switched from GH to Sourcegraph for some mild IDE-quality-of-life. Sorry about the confusion.

And yes I have read the warning before implementing it. That's why I remove the old cluster pin after updating the old one.

I'm not sure to what extend follower nodes apply these commands sequentially vs. concurrently, but delaying the removal by a few hours wouldn't seem like a bad idea. Should be easy to confirm either way by looking at a follower's pin-update-performance for non-trivial deltas that give a window for the potential race condition to have an opportunity. Though I'd not be surprised if it'd only be able to cause issues when the GC strikes on the partially-unpinned DAG before those blocks of the DAG have been consulted by the running pin update.

MFS-spamming considered harmful

As my computer (in a test of some 8 GiB of research data files with ipfs add --pin=false --recursive --only-hash --chunker=buzhash data/) appears to hash (recursively) at almost 450-500 MB/s, I'd suggest trying to just run this against rsync's target dir, putting the resulting root CID into the local MFS via ipfs files cp,

This doesn't make any sense:

First there's a good reason I don't run ipfs add over the rsync folder. It will take a lot of time to do this, rather than running just over the changed files. We talk about 1-2 orders of magnitude here.

Second only-hash would only create the meta data and discard the metadata and the data behind the hash afterward. You obviously can't add a CID to the MFS which doesn't exist in the block storage. It would be searched network-wide and the command will return eventually with an error.

Just to be clear: I know this is an experimental feature. I don't know if it's practical/feasible to prevent it from messing with the lower levels of the DAG. But it should do the part of allowing the top-level add-to-MFS step to work.

I'm speaking of data inlined into a CID. See an example: bafyaanasfyfceerarelsa7lzf555gkcwppvaawsy5v3evdztvbheoo6kbsmn4hxbxuqbeblumvzxi4yyvyzauaqiae. Using that to get the top-level CID to be self-contained and not reliant on the blockstore would take care of that "can't cp to MFS without timing out" roadblock.

Third, the paths are not correct if I do this. I do remap some paths and also do some filtering of files and folders which are not needed.

Oh, fair. If it's still a performance issue due to IPFS being bad, I'd expect hard/reflinks of the underlying filesystem with ln/cp to run faster. Also note the .gitignore-like "customizable hidden files" option on ipfs add; it might suffice for the filtering.

I don't see how this would improve anything. It was a (in retrospect, considering the GC has to be off as it'd ruin the blockstore anyways) flawed approach at combating the apparent issue of spamming fine-grained MFS manipulation ipfs files invocations instead of batching the work done by the API calls to the daemon.

... and then running ipfs add again, but this time without --only-hash. If raw leaves aren't an issue for the application, I'd expect the filestore to do a better job, as it doesn't actually copy the data around (see https://github.com/ipfs-filestore/go-ipfs/blob/master/filestore/README.md#maintenance for how to make IPFS notice the absence of backing files for blocks it believed to possess) and the new blocks only really have to be initially seeded into the cluster.

Filestore is an experimental function. I will not use experimental functions.

That's a fair stance.

Apart from that, there are a lot of limitations on how filestore do gets handled. And I run into limitations without using it. Why should I switch to a more untested and marked experimental code path?

It exists to allow ipfs add to not duplicate files on the local storage, and while the lack of absolute control over the blocks in filestore (as opposed to the normal GC-supporting store) is the source of quite a few limitations/issues, the potential performance boost might easily make up for it.

If it takes a day to process the 100-ish GB of the Arch repo incrementally, these incremental tactics might not be the answer.

I'm not sure why you want me to hash the whole directory if you think that doing it incrementally does take days. I don't think doing more work will improve anything.

As mentioned above, I was looking at an old version of the code. That seemed to process the rsync log serially with typically multiple ipfs invocations for each change entry in the rsync log.

It takes actually less than a minute, usually.

Oh, if so, that's great, I just read the issues in RubenKelevra/pacman.store#62 about runs taking a long time to finish (and quite possibly, just fail at the end).

ipfs / kubo