Massive DHT traffic on adding big datasets

Kubuxu commented 8 years ago

@Magik6k was trying to add cdnjs (something I did few months ago) he have hit https://github.com/ipfs/go-ipfs/issues/2823 but also reported that adding the dataset which was about 21GB created about 3TB of traffic.

@Magik6k could you give more precise data on this?

whyrusleeping commented 8 years ago

The solution i'm investigating here is batching together outgoing provide operations, combining findpeer queries and outgoing 'puts'. The issue is that it might require making some backwards incompatible changes to the dht protocol, which isnt really a huge problem since we have multistream

magik6k commented 8 years ago

Network: 800 GiB Out, 2 TiB in
While adding files it was using ~50/100Mbps
Around 10-20 Files/s, depending on file size, usually around 10-100KB
2633788 files
2-3 days IIRC

whyrusleeping commented 8 years ago

@magik6k Thanks! I will keep you posted as we make improvements to this.

parkan commented 8 years ago

Similar report, while adding ~70,000 objects (~100k ea) we were maxing out our instance's traffic (70MBit in, 160MBit out) for over 14 hours (!!!), at which point someone killed the IPFS daemon.

Back of the envelope, this is at least 1TB outgoing for ~10GB files.

One particular problem: because the daemon was killed before the process completed, it seems that almost none of the files were pinned. Running ipfs repo gc deleted almost everything:

; before gc
NumObjects       2995522
RepoSize         20
RepoPath         /home/ubuntu/.ipfs
Version          fs-repo@4

; after gc
NumObjects       771
RepoSize         20
RepoPath         /home/ubuntu/.ipfs
Version          fs-repo@4

(side note, the chunk counts seem way too high too, almost 40x per image instead of expected 4x)

whyrusleeping commented 8 years ago

@parkan which version of ipfs are you using? This is definitely a known issue, but it has been improved slightly since 0.4.2

parkan commented 8 years ago

@whyrusleeping 0.4.3-rc3

parkan commented 8 years ago

@whyrusleeping my intuition is that there's some kind of context that's not being released before a queue is drained, which is supported by none of the files being pinned successfully after killing the daemon. Does this seem plausible?

whyrusleeping commented 8 years ago

@parkan are you killing the daemon before the add call is complete?

parkan commented 8 years ago

@whyrusleeping no, these are all individual add calls that complete reasonably quickly, load/network traffic happens w/o client interaction

jbenet commented 8 years ago

One particular problem: because the daemon was killed before the process completed, it seems that almost none of the files were pinned. Running ipfs repo gc deleted almost everything:

That's a very serious bug to figure out. consistency is paramout here and that's much more important to get right.

Similar report, while adding ~70,000 objects (~100k ea) we were maxing out our instance's traffic (70MBit in, 160MBit out) for over 14 hours (!!!), at which point someone killed the IPFS daemon.

Back of the envelope, this is at least 1TB outgoing for ~10GB files.

This is caused by providing random access to all content, as discussed via email. for the large use case, we should look at recompiling go-ipfs without creating a provider record for each object, and leverage direct connections. (FWIW, orbit on js-ipfs works without the DHT entirely, and i imagine your use case here could do very well if you make sure to connect the relevant nodes together so there's no discovery to happen. This is the way to do it before pubsub connects them for you).

whyrusleeping commented 8 years ago

You can check which objects are pinned after an add with: ipfs pin ls --type=recursive.

It also seems really weird that your repo size is listed as 20 for both of those calls...

whyrusleeping commented 8 years ago

@parkan also, what is the process for adding files? is it just a single call to ipfs add ? or many smaller calls broken up? Are you passing any other flags?

jbenet commented 8 years ago

@parkan on chunk size, i'd be interested (separately) how well Rabin Fingerprint based chunking (ipfs add -s <file>) does for your images. If they're similar it should reduce the storage, but this is experimental and not proved to help. it's also slower on add, as it's got to do math when chunking.

parkan commented 8 years ago

@whyrusleeping ipfs pin ls was showing <500 objects pinned successfully, I don't think any of them are from this add attempt

also, small correction to my previous statement: not all ipfs add calls had completed, basically it was like this:

1) we add ~10k files with relatively fast return from ipfs add 2) adds slow down, taking >30s per file 3) daemon is killed

parkan commented 8 years ago

@whyrusleeping these are many separate adds, we basically write a temporary file (incidentally, being able to pin from stdin would be great), ipfs add it (no options) and wait for return

whyrusleeping commented 8 years ago

@parkan Ah, yeah. The pins arent written until the add is complete. An ipfs add call will only pin the root hash of the result of any given call to ipfs add.

whyrusleeping commented 8 years ago

being able to pin from stdin would be great

➜  ~ echo QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n | ipfs pin add
pinned QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n recursively

whyrusleeping commented 8 years ago

When the adds slow down that far could you get me some of the performance debugging info from https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md ?

parkan commented 8 years ago

@whyrusleeping sorry, maybe I was being unclear: for 10k+ files the "add" part was completed (i.e. the call to ipfs add returned) but pin did not land.

re: stdin, I meant ipfs add, not ipfs pin add :) Can I just pipe binary data into ipfs add -?

whyrusleeping commented 8 years ago

So, you did:

ipfs add -r <dir with 100k files in it>
...
...
...
added <some hash> <that dirs name>

Right? Does that final hash from that add call show up in the output of ipfs pin ls --type=recursive ?

I'm also confused about what is meant by 'adds slow down' if the add call completed by that point. Was this a subsequent call to ipfs add with more files?

re: stdin: you can indeed just pipe binary data to ipfs add :)

➜  ~ echo "hello" | ipfs add
added QmZULkCELmmk5XNfCgTnCyFgAVxBRBXyDHGGMVoLFLiXEN QmZULkCELmmk5XNfCgTnCyFgAVxBRBXyDHGGMVoLFLiXEN
➜  ~ dd if=/dev/urandom bs=4M count=2 | ipfs add
 7.75 MB / ? [----------=---------------------------------------------------------] 2+0 records in
2+0 records out
8388608 bytes (8.4 MB, 8.0 MiB) copied, 0.571907 s, 14.7 MB/s
added QmbPmPSxWcQqJGcq7hHJ7CGMxCQ2zxpMpNHLxARePPNv6n QmbPmPSxWcQqJGcq7hHJ7CGMxCQ2zxpMpNHLxARePPNv6n

parkan commented 8 years ago

here's the parsed dump:

``` crypto/sha256.(*digest).Sum 1 runtime.goexit 1 os/signal.signal_recv 1 gx/ipfs/QmNQynaz7qfriSUJkiEZUrm2Wen1u3Kj9goZzWtrPyu7XR/go-log.(*MirrorWriter).logRoutine 1 gx/ipfs/QmV3NSS3A1kX5s28r7yLczhDsXzkgo65cqRgKFXYunWZmD/metrics.init.1.func2 1 gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb/util.(*BufferPool).drain 1 gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).mpoolDrain 1 runtime.gopark 1 main.(*IntrHandler).Handle.func1 1 gx/ipfs/QmeYJHEk8UjVVZ4XCRTZe6dFQrb8pGWD81LYCgeLp8CvMB/go-metrics.(*meterArbiter).tick 1 gx/ipfs/QmduCCgTaLnxwwf9RFQy2PMUytrKcEH9msohtVxSBZUdgu/go-peerstream.(*Swarm).connGarbageCollect 1 main.daemonFunc.func1 1 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess.(*process).CloseAfterChildren 1 github.com/ipfs/go-ipfs/routing/dht/providers.(*ProviderManager).run 1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/briantigerchow/pubsub.(*PubSub).start 1 github.com/ipfs/go-ipfs/exchange/bitswap/decision.(*Engine).nextEnvelope 1 github.com/ipfs/go-ipfs/exchange/bitswap.(*WantManager).Run 1 gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).jWriter 1 github.com/ipfs/go-ipfs/exchange/bitswap.New.func2 1 github.com/ipfs/go-ipfs/mfs.(*Republisher).Run 1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).providerQueryManager 1 github.com/ipfs/go-ipfs/routing/dht.(*IpfsDHT).Bootstrap.func1 1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).rebroadcastWorker 1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).provideCollector 1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).provideWorker 1 github.com/ipfs/go-ipfs/namesys/republisher.(*Republisher).Run 1 gx/ipfs/QmSscYPCcE1H3UQr2tnsJ2a9dK9LsHTBGgP71VW6fz67e5/mdns.(*client).query 1 gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).mCompaction 1 gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).compactionError 1 sync.(*Pool).Get 1 github.com/ipfs/go-ipfs/blocks/blockstore.(*blockstore).AllKeysChan.func2 1 gx/ipfs/QmTxLSvdhwg68WJimdS6icLPhZi28aTp6b7uihC2Yb47Xk/go-datastore/namespace.(*datastore).Query.func1 1 gx/ipfs/QmTxLSvdhwg68WJimdS6icLPhZi28aTp6b7uihC2Yb47Xk/go-datastore/keytransform.(*ktds).Query.func1 1 gx/ipfs/QmTxLSvdhwg68WJimdS6icLPhZi28aTp6b7uihC2Yb47Xk/go-datastore/query.ResultsWithChan.func1 1 gx/ipfs/QmTxLSvdhwg68WJimdS6icLPhZi28aTp6b7uihC2Yb47Xk/go-datastore/flatfs.(*Datastore).Query.func1.1 1 gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).tCompaction 1 github.com/ipfs/go-ipfs/commands.(*ChannelMarshaler).Read 1 github.com/ipfs/go-ipfs/commands/http.internalHandler.ServeHTTP.func2 1 runtime/pprof.writeGoroutineStacks 1 gx/ipfs/QmQdnfvZQuhdT93LNc5bos52wAmdr3G2p6G8teLJMEN32P/go-libp2p-peerstore.(*AddrManager).AddAddrs 1 gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/discovery.(*mdnsService).pollForEntries.func1 1 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).Ping 1 github.com/ipfs/go-ipfs/routing/dht.(*IpfsDHT).GetClosestPeers.func2 1 syscall.Syscall 1 main.daemonFunc 1 gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/net/swarm.(*Swarm).addConnListener.func2 2 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess/periodic.callOnTicker.func1 2 gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/net/conn.(*listener).Accept 2 github.com/ipfs/go-ipfs/core/corehttp.Serve 2 main.merge.func1 2 gx/ipfs/QmdhsRK1EK2fvAz2i2SH5DEfkL6seDuyMYEsxKa9Braim3/client_golang/prometheus.computeApproximateRequestSize 2 gx/ipfs/QmduCCgTaLnxwwf9RFQy2PMUytrKcEH9msohtVxSBZUdgu/go-peerstream.(*Swarm).setupStream.func1 4 gx/ipfs/Qmf91yhgRLo2dhhbc5zZ7TxjMaR1oxaWaoc9zRZdi1kU4a/go-multistream.(*lazyConn).readHandshake 6 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).taskWorker 8 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess.(*process).doClose.func1 9 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess.(*processLink).AddToChild 10 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).waitForSendErr 11 gx/ipfs/QmduCCgTaLnxwwf9RFQy2PMUytrKcEH9msohtVxSBZUdgu/go-peerstream.(*Swarm).removeStream.func1 16 github.com/ipfs/go-ipfs/routing/dht.(*IpfsDHT).Provide.func1 19 github.com/ipfs/go-ipfs/routing/dht.(*messageSender).ctxReadMsg 25 gx/ipfs/QmX6DhWrpBB5NtadXmPSXYNdVvuLfJXoFNMvUMoVvP5UJa/go-context/io.(*ctxReader).Read 27 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess.(*process).Close 86 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Stream).Read 119 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).send 133 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).keepalive 143 github.com/ipfs/go-ipfs/exchange/bitswap.(*msgQueue).runQueue 144 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).AcceptStream 144 net.runtime_pollWait 156 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess.(*process).doClose 265 github.com/ipfs/go-ipfs/routing/dht.(*dhtQueryRunner).spawnWorkers 425 gx/ipfs/QmQdnfvZQuhdT93LNc5bos52wAmdr3G2p6G8teLJMEN32P/go-libp2p-peerstore/queue.(*ChanQueue).process.func1 426 gx/ipfs/QmZy2y8t9zQH2a1b8q2ZSLKp17ATuJoCNxxyMFG5qFExpt/go-net/context.propagateCancel.func1 426 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess/context.CloseAfterContext.func1 433 github.com/ipfs/go-ipfs/routing/dht.(*dhtQueryRunner).Run 509 github.com/ipfs/go-ipfs/routing/dht.(*IpfsDHT).Provide 511 sync.runtime_Semacquire 2187 ```

parkan commented 8 years ago

@whyrusleeping no, we are calling ipfs add on each individual file because there's no actual tree to add from

so, again:

for each image
- our script writes a tempfile
- we use py-ipfs-api to ipfs add the file

The slowdown happens about 10k images into the loop

whyrusleeping commented 8 years ago

Thanks for the stack dump, that just confirms for me that the problems are caused by the overproviding. The 2000 gorotuines stuck in runtime_Semacquire are somewhat odd though, could I get the full stack dump to investigate that one further?

As for the adds, after each ipfs add call completes (or some subset of them complete) are those hashes visible in the output of ipfs pin ls --type=recursive ? every call to ipfs add should put an entry in there before it returns.

parkan commented 8 years ago

@whyrusleeping I'm trying to repro this behavior, but I am quite certain that we ended up with 10,000+ added but not pinned files in yesterday's run

Full stax: https://ipfs.io/ipfs/QmP344ugRiyZRKA2djLgE371MzW2FhwbghnJyH7hmbmAeQ

whyrusleeping commented 8 years ago

@parkan thank you! that stack dump is pretty gnarly, lots of lock contention trying to spew messages out to everyone on the dht, thats the providers problem alright. For that, as @jbenet says, we can make a separate build (for now, can merge in as an option later) that only provides the hashes you explicitly tell it to.

As for the pinning issue, thats a very serious issue if what you say is correct. I will investigate and try to reproduce locally as well. cc @Kubuxu for an extra set of eyes

parkan commented 8 years ago

@whyrusleeping I naively tried to do ipfs add --local $file, but this still seems to provide; I am guessing --local really means "no network reads", not "no touching network", right?

jbenet commented 8 years ago

@parkan

are you doing ipfs add from the filesystem? possible for you to add offline? (ie ipfs add without daemon running, then turn it on when you're done)
are you doing this in own network or public network?
possible for you do to it disconnected? not connected to the public dht while you do this huge add.
if own network, the query should be exhausting quickly, odd if it's not.

whyrusleeping commented 8 years ago

Yeah, the --local flag doesn't work for all commands unfortunately. Its something we need to unify. It mostly means "don't read data from the network for this call"

parkan commented 8 years ago

@jbenet would love to have a chat about working around provider system ASAP. This dataset is <0.01% of our target scale so it doesn't sound like we can work with it.

parkan commented 8 years ago

I will try to repro the pinning failure case -- fairly confident it happened as I suspect, because I don't see any other way for all those objects to have ended up in the store but not pinned.

Separately, running ipfs add ... w/no daemon running seems to complete in reasonable and non-increasing time, is the data just coped into the store w/o announce in this mode?

Kubuxu commented 8 years ago

Yes, it will be re-announced periodically.

You can also run daemon in offline mode, ipfs daemon --offline.

jbenet commented 8 years ago

@parkan

yes, no announcements there until you turn on the daemon. When you turn it on, it will announce, but it's a periodic re-announce thing. It's still going to be a BW hog, but it will be less ridiculous. We still need to work around this.
I strongly recommend doing this offline or in a separate, isolated network. Announcements there will be more reasonable and not bounded by the "dial latency" of connecting to very crap peers.
yes, absolutely, let's work around this provider problem, we need to do it for everyone anyway, but the priority has been lower because other things have been requested more until now. Let's look deeper into this.

parkan commented 8 years ago

Ok I can confirm that the number of pinned files has fallen to 1 despite having 299392 objects in repo. Should I open a new issue for this?

whyrusleeping commented 8 years ago

@parkan yes please, and if you could provide repro steps

jbenet commented 8 years ago

@parkan could you make a git repo that reproduces this exact problem? would love to have it as a test case.

parkan commented 8 years ago

@jbenet it's a bit hard to repro but I'll do my best -- seems like a serious issue

whyrusleeping commented 8 years ago

@parkan does the daemon output any errors when this happens?

whyrusleeping commented 8 years ago

Alright, I made a few PRs to help out here with the providers problem, #3101 #3102 #3103 #3105 and #3106

whyrusleeping commented 8 years ago

A bit of an update here, I've gotten a few of these PRs merged, only one remaining. I ran some tests to check bandwidth usage here, good results so far: https://github.com/ipfs/go-ipfs/pull/3105#issuecomment-242183017

parkan commented 8 years ago

@whyrusleeping sorry missed your earlier question, no relevant errors that I could see. Improvements looking fantastic, though!

whyrusleeping commented 8 years ago

@parkan thanks! I've got a few more things up my sleeve that should help too, but theyre a bit farther out. In the meantime, you should now be able to do all your adds with the --local flag, and then manually do ipfs dht provide on the keys you want advertised to the dht. The provide command has a -r flag to recursively provide all child blocks too, but run without it will just provide the single given block.

Please let me know if anything weird comes up or you have any questions.

parkan commented 8 years ago

@whyrusleeping awesome! Great turnaround on this. Should we build from master to have these changes reflected?

whyrusleeping commented 8 years ago

@parkan yeah, you will have to build from master. I can also provide 'development' builds on the dists page if there is an interest in that

parkan commented 8 years ago

@whyrusleeping having unstable/dev/nightly binaries around would definitely be appreciated if it's not too much effort

whyrusleeping commented 8 years ago

@parkan sure thing! i'll push out some builds after #3105 merges, and start the discussion about automating some nightly releases

parkan commented 8 years ago

@whyrusleeping getting a nightly/CI build baked would be 👌

jbenet commented 8 years ago

Yeah, it would be great to have channels in distributions. filed it over at https://github.com/ipfs/distributions/issues/77 and https://github.com/ipfs/distributions/issues/78

hobofan commented 8 years ago

@whyrusleeping I'd also really appreciate a development build, especially because of this issue. :)

hobofan commented 8 years ago

After adding 200MB of the Wikidata dataset with the current master version I have these stats:

Bandwidth
TotalIn: 11 GB
TotalOut: 2.0 GB

Compared to the previous report in https://github.com/ipfs/apps/issues/27#issuecomment-242562201 that's an improvement of ~2.65x for inbound and ~3.2x for outbound traffic when using the current master compared to 0.4.2.

The dataset consists of 24 million JSON files which are mostly 1-20KB big. The CLI estimated about ~12 days until completion which doesn't sound too great, but if time to publish the weekly changes of the dataset after that ends up being a few hours, it might be workable for that project.

casey commented 6 years ago

Has the state of this issue changed since the last update?

I'm evaluating IPFS for a project, and if a 200MB dataset generates 13GB of traffic to add the data to IPFS, it will rule out IPFS as an option.

We would really like to be interoperable with IPFS and the broader IPFS ecosystem, but we'll have to use BitTorrent instead.

ipfs / kubo

Massive DHT traffic on adding big datasets #2828