Closed davidar closed 8 years ago
yeah this option would be nice.
I can add it pretty easily by patching flatfs in dev0.4.0 (where we actually respect the config for datastores)
Thanks. Performance of ipfs add
is a major issue for ipfs/archives
@davidar i'll set up a branch for you with nosync hardcoded so you can work faster.
@whyrusleeping much appreciated :)
@davidar try this out
@whyrusleeping I'm getting the following error when trying to build (on both temp-nosync and dev0.4.0 branches):
./daemon.go:207: multiple-value repo.Config() in single-value context
@davidar for now https://github.com/rht/go-ipfs/tree/dev0.4.0 and https://github.com/rht/go-ipfs/tree/temp-nosync, rebased on current master.
Thanks @rht
@whyrusleeping @rht Hmm, no dice. The temp-nosync branch is still painfully slow trying to add a directory with lots of small files (eta approaching 400h).
Dependency: https://github.com/jbenet/go-datastore/pull/30.
(sync) up to 1000 files with each 1KB
(nosync) up to 1000 files with each 1KB
(nosync) up to 5000 files with each 1KB (the bottleneck hasn't been characterized yet)
@rht only git is fair comparison. other's not really.
but anyway, sure. let's add it both:
ipfs add --no-sync
) (harder)@jbenet git still beats the pants off ipfs though (if I squint, I can almost see the line for git :p)
Here is one with darcs and sqlite added.
(sync) up to 500 files each 1KB
(nosync) up to 500 files each 1KB without sqlite
Since offline add is equivalent to creating ipfs archive format, I think this benchmark is fair. Due to random files, no deduplication / loose object packing is involved. git
was often compared with cp / rsync in the past (though git fetch
instead of git add
), hence is included here.
iirc the next bottleneck is protobuf Marshal, but I have to check again.
nice. @rht want to shepherd these changes? we need:
This is a major bottleneck to the archiving effort (otherwise ipfs archive could have meshed with ia sooner).
It's rather I'm writing the change (global config one) right away after you merge the sync flag in go-datastore. ipfs add --no-sync
is hard when the daemon is on.
(and on top of dev0.4.0 after dev0.4.0 rebase, since it has a lot of datastore changes)
This is a major bottleneck to the archiving effort
Even with nosync, there's still a major bottleneck somewhere :/
Yes but at least faster than sync-sqlite.
Here is a breakdown of why things are slow:
ipfs add -r -q Godeps
git: 278ms sync: addFile 67.480s
no-sync: addFile 918ms
add 286ms
importer.BuildDagFromReader 284ms
bal.BalancedLayout 283ms
db.Add 170ms (helpers.DagBuilderHelper)
dagservice.Add 159ms
addNode 432ms
InsertNodeAtPath 491ms
root.GetLinkedNode 370ms
n.GetNodeLink 384ms
dagservice.Add 136ms
Bottleneck of both add
and addNode
converges to dagservice.Add
:
dagservice.Add 386ms
nd.Encoded(false) 174ms
sort.Stable 17ms
n.Marshal 35.6 ms
u.Hash 139ms
n.Blocks.AddBlock 205ms
s.Blockstore.Put 182ms
block.Key().DsKey() 14ms
bs.datastore.Has 63ms
bs.datastore.Put 118ms
The slowest part that can be optimized is perhaps n.GetNodeLink
, the link search (basically to get the hash of the folder) can possibly be cached.
Also, for every single file insert, does the hash of the folder containing it has to be recomputed?
What about InsertNodesAtPath
for inserting several nodes at once?
Also, for every single file insert, does the hash of the folder containing it has to be recomputed?
It seems like a zipper could help here, which allows efficient traversal and mutation of persistent datastructures (like merkledags). This is what @argonaut-io uses, for example.
Worth implementing, I think. It appears that there exists a filesystem based on zipper (http://okmij.org/ftp/continuations/zipper.html).
The merkle version of zipper is possible if the root hash and the sub-root hashes along the path to the nodes (node(s) for the case of concurrent mutation) aren't precomputed. In go-ipfs
, indeed the hashes are computed only when the nodes are about to be committed to disk.
(@davidar now you don't have to squint https://github.com/ipfs/go-ipfs/pull/1964#issuecomment-156912258)
(@whyrusleeping perhaps zipper could be a name for what you requested in https://github.com/ipfs/go-ipfs/blob/master/unixfs/mod/dagmodifier.go#L36 ?)
@rht i forgot about that comment, something along the lines of zipper sounds good to me there
zipper :+1:
Worth implementing, I think. It appears that there exists a filesystem based on zipper (http://okmij.org/ftp/continuations/zipper.html).
Haha, of course Oleg would have written such a thing
@rht I want to deploy nosync to castor.i.ipfs.io, which branch should I use? Is there one based on dev0.4.0?
@lgierth https://github.com/ipfs/go-ipfs/tree/dev0.4.0 contains nosync (by @whyrusleeping since 3 days ago). Sorry should have notified.
lovely :)
is this issue good to be closed, then?
ipfs add
with "NoSync": true
is nice and fast on the dev0.4.0 hosts (castor, pollux, pluto)
From #1324: